This is an archive of the discontinued LLVM Phabricator instance.

[clang-offload-bundler] Enable handling of partially-linked fat objects
AbandonedPublic

Authored by grokos on Feb 7 2020, 3:54 PM.

Details

Summary

This is the bundler-side patch for enabling static library support in clang. The scheme has been discussed extensively in the past and is described in this document prepared by @sdmitriev:

.

Patch was developed in collaboration with @kbobrovs and a similar version has been merged with Intel's SYCL compiler (https://github.com/intel/llvm/tree/sycl).

When a fat object is created, for each bundle the bundler also creates a corresponding "size" section consisting of a single 64-bit integer storing the size of the bundle. When linking from static objects, the host linker will fetch all dependencies and do a partial linking on them; this action concatenates all sections with the same name across fetched dependencies into a new aggregate section, so for each target there will be an aggregate section containing the concatenated bundles and another aggregate section containing the concatenated sizes. By visiting the aggregate sizes section the unbundler can then split the aggregate bundle into separate output device objects.

The patch introduces a new type "oo" which is used when unbundling partially-linked fat objects. When "oo" is specified, the output file is not an object file itself; instead it is a text file containing the paths to the actual outputs (because we may have multiple device objects as outputs - one for each dependency that was fetched).

Invocation of the host linker (to do partial-linking) and cleanup of temporary files will be done by the Driver. Once the bundler patch lands, the Driver patch will follow.

Diff Detail

Event Timeline

grokos created this revision.Feb 7 2020, 3:54 PM

Partial linking may lead to some incorrect results with global constructors. How are you going to handle this?

clang/test/Driver/clang-offload-bundler-missing-size-section.cpp
1–44

Very strange test. It should not contain standard includes. Also, it should not be an executable test, you have to check for the driver output or something similar

clang/test/Driver/clang-offload-bundler-oo.cpp
1–18

Same about this test

clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
84

Hmm, are you going to introduce a new kind of output? It really requires RFC.

(I'll try to actually review this later but I left a comment below)

clang/test/Driver/clang-offload-bundler-missing-size-section.cpp
1–44

We have executable tests in the OpenMP target offloading part, if it cannot live with clang it can live there.

jdoerfert added inline comments.Feb 13 2020, 7:53 AM
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
84

This is the offload-bundler tool, right? Who is using that except OpenMP (and SYCL)?

Is there a reason for oo? uo (=unboundled object), or do (=device object)?

160

I don't understand the comment. If \p FileName is a list of outputs, how does this work?

grokos marked 2 inline comments as done.Feb 14 2020, 10:53 AM

Partial linking may lead to some incorrect results with global constructors. How are you going to handle this?

Can you give me an example of what can break? I remember reading a conversation about some linker patch some time ago but I cannot recall the details.

clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
84

No one else (at least for now). But I can send out an RFC regarding the new output anyway.

oo is related to the fact that under this scheme we can have multiple .o files as output (many o's). But if you think some of the other abbreviations makes more sense, I'm happy to change it.

160

The scheme is described in the attached pdf. In short, when the host liner fetches dependencies from a static library, alongside the host bundle it also fetches the device bundle. Now, if we have multiple dependencies from multiple objects inside a static library (or multiple static libraries) the host linker will perform a partial linking between all fetched bundles for the targets we are interested in. The result is a fat object in which each target bundle is the result of concatenating the individual bundles for that target we fetched from each static library. We also keep track of the size of each fetched bundle (we use a new sizes section per target inside the fat object for this purpose) so that the unbundler can separate the partially-linked bundle into the original object files it was assembled from. Usually, we don't know a priori how many dependencies will be brought in, so we don't know how many objects we're going to have at outputs. Therefore, in oo unbundling mode, the user specifies a single output file per target (just like in any other unbundling mode) which the unbundler populates with the paths to the actual output device objects. Then the driver reads those paths and passes them on to the device linker.

Partial linking may lead to some incorrect results with global constructors. How are you going to handle this?

Can you give me an example of what can break? I remember reading a conversation about some linker patch some time ago but I cannot recall the details.

See the discussion here https://reviews.llvm.org/D65819

jsjodin added a subscriber: jsjodin.Apr 8 2020, 9:11 AM
grokos abandoned this revision.May 6 2020, 6:10 PM

The partial linking scheme has been found to not work correctly in all cases (it fails when we have libraries with device code only). A new patch will be uploaded which will be based on archive extraction.