This is an archive of the discontinued LLVM Phabricator instance.

[OffloadPackager] Add option to extract files from images
ClosedPublic

Authored by jhuber6 on Jul 11 2022, 11:48 AM.

Details

Summary

We use the clang-offload-packager too bundle many files into a single
binary format containing metadata. This is used for offloading
compilation which may contain multiple device binaries of different
types and architectures in a single file. We use this special binary
format to store these files along with some necessary metadata around
them. We use this format because of the difficulty of determining the
filesize of the various binary inputs that will be passed to the
offloading toolchain rather than engineering a solution for each input.

Previously we only support packaing many files into a single binary.
This patch adds support for doing the reverse by using the same
--image= syntax. To unpackage a binary we now present an input file
instead of an output.

Diff Detail

Event Timeline

jhuber6 created this revision.Jul 11 2022, 11:48 AM
jhuber6 requested review of this revision.Jul 11 2022, 11:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2022, 11:48 AM
jhuber6 added inline comments.Jul 11 2022, 11:49 AM
llvm/tools/llvm-objcopy/ObjcopyOpts.td
200

I think the syntax for this flag could be improved, users may want to dump all files matching a triple, architecture, etc. Let me know if you have any suggestions.

jhuber6 updated this revision to Diff 443705.Jul 11 2022, 11:57 AM

Leftover objdump in test

tra added a comment.Jul 11 2022, 12:24 PM

It would be great to allow dumping all offloading binaries at once. E.g. something like --dump-offloading all=file-name or just --dump-offloading file-name which would save all binaries into file-name.0 .. file-name.N.

llvm/tools/llvm-objcopy/ObjcopyOptions.cpp
865

--dump-section -> --dump-offloading ?

The feature doesn't fit into llvm-objcopy's purpose. llvm-objcopy is a binary manipulation tool which only understands ELF. --dump-offloading seems to parse the content using knowledge of the underlying format.

MaskRay added 1 blocking reviewer(s): MaskRay.Jul 11 2022, 12:28 PM
jhuber6 marked an inline comment as done.Jul 11 2022, 12:29 PM

The feature doesn't fit into llvm-objcopy's purpose. llvm-objcopy is a binary manipulation tool which only understands ELF. --dump-offloading seems to parse the content using knowledge of the underlying format.

I see, I simply chose it because it had a similar feature for --dump-sections. Should this be placed in llvm-objdump as well then?

It would be great to allow dumping all offloading binaries at once. E.g. something like --dump-offloading all=file-name or just --dump-offloading file-name which would save all binaries into file-name.0 .. file-name.N.

That's a good idea, I'll add it.

tra added a comment.Jul 11 2022, 12:47 PM

The feature doesn't fit into llvm-objcopy's purpose. llvm-objcopy is a binary manipulation tool which only understands ELF. --dump-offloading seems to parse the content using knowledge of the underlying format.

Which tool would be appropriate for something like this? Do you suggest that we need a new tool?

The "only understands ELF" part may be open to interpretation. If one were to implement an extension to ELF format, would objcopy be the right place to support it? While objdump does have knowledge of the underlying formats, it does not deal with extraction of such data, but rather interprets it for human consumption.
This use case is somewhere in between -- it involves binary data manipulation, which is what objcopy does, but it does require some knowledge of the underlying format.

IMO, the offloading packing format is closer to being an extension of ELF, than being yet another ISA to disassemble (i.e. it's simple, generic and unlikely to change much) and objcopy seems like a decent fit. Offloading data manipulation may eventually be extended to insert new offload binaries, too, which would also fit objcopy functionality nicely.

The feature doesn't fit into llvm-objcopy's purpose. llvm-objcopy is a binary manipulation tool which only understands ELF. --dump-offloading seems to parse the content using knowledge of the underlying format.

Will this feature be acceptable llvm-objcopy considering we already have code there? I'd like to know where you think this code should live before I put in the effort to make any changes.

MaskRay added a subscriber: avl.EditedJul 12 2022, 3:53 PM

The feature doesn't fit into llvm-objcopy's purpose. llvm-objcopy is a binary manipulation tool which only understands ELF. --dump-offloading seems to parse the content using knowledge of the underlying format.

Which tool would be appropriate for something like this? Do you suggest that we need a new tool?

The "only understands ELF" part may be open to interpretation. If one were to implement an extension to ELF format, would objcopy be the right place to support it? While objdump does have knowledge of the underlying formats, it does not deal with extraction of such data, but rather interprets it for human consumption.
This use case is somewhere in between -- it involves binary data manipulation, which is what objcopy does, but it does require some knowledge of the underlying format.

For example, @avl wanted to create a DWARF utility and I vaguely recall there was a discussion whether it applied to llvm-objcopy.
The answer is no as llvm-objcopy shouldn't know how to parse llvm-objcopy.

IMO, the offloading packing format is closer to being an extension of ELF, than being yet another ISA to disassemble (i.e. it's simple, generic and unlikely to change much) and objcopy seems like a decent fit. Offloading data manipulation may eventually be extended to insert new offload binaries, too, which would also fit objcopy functionality nicely.

It's not an extension of ELF. It is a new format with content embedded in an ELF section (according to while (Offset < Buffer->getBufferSize()) { and OffloadBinary::create). I don't really keep up-to-date with the already-plethora of GPU/offloading tools so I do not have an idea out of my head. If adding a new tool seems to be overkill, GPU/offloading folks may need to re-think how to reorganize the tools.
The question isn't raised for the first time why we have so many llvm-*/clang-* tools doing GPU/offloading stuff now.

MaskRay added a comment.EditedJul 12 2022, 3:54 PM

Note: llvm-objcopy --dump-section .text=text a.out rewrites a.out (the output is nearly indistinguishable, but the byte stream may be different). You need llvm-objcopy --dump-section .text=text a.out /dev/null to suppress the rewrite.
This means --dump-offloading likely doesn't have good ergonomics in llvm-objcopy.

tra added a comment.Jul 12 2022, 4:15 PM

The question isn't raised for the first time why we have so many llvm-*/clang-* tools doing GPU/offloading stuff now.

Well, it's a bit more complicated, but I do agree that offloading machinery we have is a bit of a mess and could use some streamlining.

That said, the issue at hand here is orthogonal to that. I'll take your word that llvm-objcopy is not the right place for offloading image manipulation.
That leaves some other tool.
The options I see are:

  • llvm-objdump -- may not be the best fit as we want to save offload binaries to files, not just print them out. Whether we may ever want to embed offloading binaries is an open question. I think it could be useful, but it's probably a very niche use case, even for those few of us who care what's under the hood in the offloading binaries.
  • incorporate functionality into clang-offload-packager.
  • write a new tool?

I personally would prefer to avoid creating yet another tool. clang-offload-packager may be a reasonable place for this, considering that it already deals with the new offloading binary packaging style.

It's not an extension of ELF. It is a new format with content embedded in an ELF section (according to while (Offset < Buffer->getBufferSize()) { and OffloadBinary::create). I don't really keep up-to-date with the already-plethora of GPU/offloading tools so I do not have an idea out of my head. If adding a new tool seems to be overkill, GPU/offloading folks may need to re-think how to reorganize the tools.
The question isn't raised for the first time why we have so many llvm-*/clang-* tools doing GPU/offloading stuff now.

FWIW I've pretty much deprecated three of those GPU/offloading tools and introduced two. We have a lot of these tools mostly because of legacy which I'm hoping will be removed in time. Generally the work I'm doing is noisy, but a net reduction in complexity. The features I'm adding here are not strictly necessary for functionality, but are meant to be convenient tools for users familiar with stuff like cuobjdump from Nvidia. I wouldn't think a new tool is necessary as the actual code required to support these features is relatively small and in-line with many other data formats embedded in an ELF.

Note: llvm-objcopy --dump-section .text=text a.out rewrites a.out (the output is nearly indistinguishable, but the byte stream may be different). You need llvm-objcopy --dump-section .text=text a.out /dev/null to suppress the rewrite.
This means --dump-offloading likely doesn't have good ergonomics in llvm-objcopy.

Noted, thanks.

I personally would prefer to avoid creating yet another tool. clang-offload-packager may be a reasonable place for this, considering that it already deals with the new offloading binary packaging style.

That tool is more or less a straight clone of Nvidia's fatbinary. We could potentially put it there, as it already creates the binaries that we embed inside the host, but it's a little out of scope. I can definitely put it there if people have major objections to putting stuff like this in llvm-objdump I could've made everything work without it, but it made the job of passing this information from the compiler driver to the back-end much more straightforward. Theoretically we could get rid of that and merge the clang-linker-wrapper into lld and bring the number of tools to zero.

I'm not familiar with some of the mentioned tools, but I tend to agree with @MaskRay's conclusions.

  1. The functionality doesn't really fit in llvm-objcopy because the tool is intended for modifying a file in-place or making a (modified) copy to another place, neither of which is what you'll actually want it for. I think it would be appropriate for extracting the offloading section from an ELF binary as a whole, since that would basically just be --dump-section. Ideally, we'd have the ability to extract by section type, not just section name, but that's getting out-of-scope for this discussion.
  1. llvm-objdump shouldn't be writing things to a file (or producing "binary" output that is intended for piping directly to a file). It's goal is printing information about object files. It is appropriate for it to be able to dump the offloading sections in an interpreted format, but not to extract them.

That means the functionality needs adding to a different tool, I feel (whether that tool is a new one or an existing one, I don't know).

I'm not familiar with some of the mentioned tools, but I tend to agree with @MaskRay's conclusions.

  1. The functionality doesn't really fit in llvm-objcopy because the tool is intended for modifying a file in-place or making a (modified) copy to another place, neither of which is what you'll actually want it for. I think it would be appropriate for extracting the offloading section from an ELF binary as a whole, since that would basically just be --dump-section. Ideally, we'd have the ability to extract by section type, not just section name, but that's getting out-of-scope for this discussion.
  1. llvm-objdump shouldn't be writing things to a file (or producing "binary" output that is intended for piping directly to a file). It's goal is printing information about object files. It is appropriate for it to be able to dump the offloading sections in an interpreted format, but not to extract them.

That means the functionality needs adding to a different tool, I feel (whether that tool is a new one or an existing one, I don't know).

llvm-objdump has the -s -j <secname> functionality, but that just dumps straight to the screen in hex format. I could put this in the clang-offload-packger, the tools that makes these blobs in the first place, but since it's a clang tool and doesn't care about object files I figured it wouldn't be a good fit. But if people are opposed to this kind of functionality in llvm-objdump, llvm-readobj, or llvm-objcopy I can just put it in there so it's at least available.

I'm not familiar with some of the mentioned tools, but I tend to agree with @MaskRay's conclusions.

  1. The functionality doesn't really fit in llvm-objcopy because the tool is intended for modifying a file in-place or making a (modified) copy to another place, neither of which is what you'll actually want it for. I think it would be appropriate for extracting the offloading section from an ELF binary as a whole, since that would basically just be --dump-section. Ideally, we'd have the ability to extract by section type, not just section name, but that's getting out-of-scope for this discussion.
  1. llvm-objdump shouldn't be writing things to a file (or producing "binary" output that is intended for piping directly to a file). It's goal is printing information about object files. It is appropriate for it to be able to dump the offloading sections in an interpreted format, but not to extract them.

That means the functionality needs adding to a different tool, I feel (whether that tool is a new one or an existing one, I don't know).

llvm-objdump has the -s -j <secname> functionality, but that just dumps straight to the screen in hex format. I could put this in the clang-offload-packger, the tools that makes these blobs in the first place, but since it's a clang tool and doesn't care about object files I figured it wouldn't be a good fit. But if people are opposed to this kind of functionality in llvm-objdump, llvm-readobj, or llvm-objcopy I can just put it in there so it's at least available.

I think none of llvm-objdump, llvm-readobj, or llvm-objcopy is applicable. I don't know the offloading ecosystem well, but it seems that if a tool already does something like extracting an offloading section, it seems natural for it to support --dump-offloading if such a debugging functionality is needed.
If clang-offload-packager seems misnomer, since the utility is still new, a rename can be considered. If its main code is in clang/lib, you can add the dump functionality in llvm/lib/Object (to sit beside code which parses/processes some binary formats) and let the tool call that code.

jhuber6 planned changes to this revision.Jul 14 2022, 10:33 AM

I think none of llvm-objdump, llvm-readobj, or llvm-objcopy is applicable. I don't know the offloading ecosystem well, but it seems that if a tool already does something like extracting an offloading section, it seems natural for it to support --dump-offloading if such a debugging functionality is needed.
If clang-offload-packager seems misnomer, since the utility is still new, a rename can be considered. If its main code is in clang/lib, you can add the dump functionality in llvm/lib/Object (to sit beside code which parses/processes some binary formats) and let the tool call that code.

Fair enough, I'll just stash this with the clang-offload-packager.

jhuber6 updated this revision to Diff 453354.Aug 17 2022, 11:13 AM

Moving to clang packager.

Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2022, 11:13 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
jhuber6 retitled this revision from [llvm-objcopy] Add option to dump embedded offloading images to [OffloadPackager] Add option to extract files from images.Aug 17 2022, 11:14 AM
jhuber6 edited the summary of this revision. (Show Details)
tra added inline comments.Aug 17 2022, 12:19 PM
clang/test/Driver/offload-packager.c
12–14 ↗(On Diff #453354)

So, with no -o option the tool treats --image to both choose the embedded image to dump and the file name to dump it into.
The patch appears to have more dumping modes. E.g. file name appears to be deriveable from the embedded image info. That should be tested, too.

Does user need to specify complete kind/triple/arch? Can I dump a subset of images -- e.g. all images with the same triple, or all with arch starting with sm_7?

It would also be very convenient if we could dump an image by its index in the executable and, also, all images. I suspect these two modes would be the ones used most often interactively.

BTW, we do not seem to have a way to list those embedded images. It would be great to add it.

15 ↗(On Diff #453354)

I'd use diff -q or maybe cmp, though I don't know if it's available on Windows. We only want to know if the files differ and do not want it to accidentally dump binary on the screen.

clang/tools/clang-offload-packager/ClangOffloadPackager.cpp
70 ↗(On Diff #453354)

Hooray for c++17!

168 ↗(On Diff #453354)

A comment would be useful here. Or conversion of --image=k1=v1,k2=v2,k3=v3... into a kN->vN map could be extracted into a helper function with a meaningful name.

193 ↗(On Diff #453354)

Nit: Args.count("file") ? Args["file"] : Saver.save(...) has one less !.

194–196 ↗(On Diff #453354)

Is this guaranteed to be unique? I'd add an image index to the name.

204 ↗(On Diff #453354)

You could use llvm::copy which works with ranges.

jhuber6 marked 3 inline comments as done.Aug 17 2022, 2:20 PM

Thanks for the comments.

clang/test/Driver/offload-packager.c
12–14 ↗(On Diff #453354)

So, with no -o option the tool treats --image to both choose the embedded image to dump and the file name to dump it into.
The patch appears to have more dumping modes. E.g. file name appears to be derivable from the embedded image info. That should be tested, too.

I tried to write a test for this. Given no explicit filename and multiple matches it should dump it as <input>-<triple>-<arch>.o or similar. It works when I test it locally but when I wrote a test for this I don't think it was writing to the correct directory because the lit test couldn't find them.

Does user need to specify complete kind/triple/arch? Can I dump a subset of images -- e.g. all images with the same triple, or all with arch starting with sm_7?

As written now, it should just write every image it finds that matches the image values. So --image=kind=openmp would dump everything whose producer was OpenMP. If it was --image=kind=openmp,file=out.o it would then write to out.o multiple times so you'd just end up with the last one. If you didn't provide a filename it would derive one as above. Not sure if it's worth handling this more intelligently since I think we should keep the user's requested name even if it's ambiguous.

It would also be very convenient if we could dump an image by its index in the executable and, also, all images. I suspect these two modes would be the ones used most often interactively.

BTW, we do not seem to have a way to list those embedded images. It would be great to add it.

That should be was llvm-objdump --offloading does, right? Or do you mean something different from just printing what's embedded.

clang/tools/clang-offload-packager/ClangOffloadPackager.cpp
70 ↗(On Diff #453354)

Indeed.

168 ↗(On Diff #453354)

I was thinking about that as it's somewhat obtuse, I'll try to make it clearer.

193 ↗(On Diff #453354)

The formatting looked a little weird when I did it that way. Not a huge deal however.

194–196 ↗(On Diff #453354)

Not necessarily, you're right that it might be good to use the image's index to provide a unique name in the case of conflicts. As mentioned in another comment I'm not sure if we should modify the output file if the user specified it however.

204 ↗(On Diff #453354)

Good point.

tra added inline comments.Aug 17 2022, 2:50 PM
clang/test/Driver/offload-packager.c
12–14 ↗(On Diff #453354)

lit test couldn't find them.

I see. The directory the test is run from is not necessarily the temporary directory. For that matter, such a test will fail in some setups where the source tree is read-only.
I think we need something like ar --output=dir which is used for very similar purposes.

Can I dump a subset of images ?

Given that we can narrow the subset to particular kind/triple/arch, it should do for now.

That should be was llvm-objdump --offloading does, right?

Now that lang-offload-packager got more functionality, it would make sense to make it the tool for listing embedded images, too, IMO.

clang/tools/clang-offload-packager/ClangOffloadPackager.cpp
194–196 ↗(On Diff #453354)

Agreed, user input takes precedence.

My concern was for the clashes caused by the tool itself. There may conceivably be embedded images with identical properties and we need a way to distinguish between them.

jhuber6 marked 2 inline comments as done.Aug 17 2022, 3:11 PM
jhuber6 added inline comments.
clang/test/Driver/offload-packager.c
12–14 ↗(On Diff #453354)

I think we need something like ar --output=dir which is used for very similar purposes.

That will work for now, I'll add that and update this patch with better testing.

Now that clang-offload-packager got more functionality, it would make sense to make it the tool for listing embedded images, too, IMO.

It shouldn't be too difficult to add since we'll be iterating through all of these already. I could do that in another patch.

clang/tools/clang-offload-packager/ClangOffloadPackager.cpp
194–196 ↗(On Diff #453354)

Yeah, I'll add .x to the filename for that reason.

jhuber6 updated this revision to Diff 453456.Aug 17 2022, 3:56 PM
jhuber6 edited the summary of this revision. (Show Details)

Update, still having problems making the test but I figured I'd just update now.

tra added a comment.Aug 17 2022, 4:22 PM

Update, still having problems making the test but I figured I'd just update now.

You could try something like this:
https://github.com/llvm/llvm-project/blob/d20e632853ad978e8008747d838d51b51d823a53/lld/test/MachO/stabs.s#L48

# RUN: cd %t && %lld -lSystem test.o foo.o no-debug.o -o test
jhuber6 updated this revision to Diff 454106.Aug 19 2022, 1:59 PM

Using @tra's suggestion to use cd. I had to make the test not apply to Windows
however, since I had to use realpath. But we don't support Windows anyway.

tra accepted this revision.Aug 19 2022, 2:23 PM
tra added inline comments.
clang/test/Driver/offload-packager.c
26 ↗(On Diff #454106)

I'd also check that only openmp files got extracted. You may also want to delete all files in the temp directory -- I'm not sure if lit guarantees that it will be clean. I've seen other tests manually cleaning it in some cases.

This revision is now accepted and ready to land.Aug 19 2022, 2:23 PM
jhuber6 updated this revision to Diff 454108.Aug 19 2022, 2:30 PM

Updating documentation, cleaning up, and adjusting tests.

This revision was automatically updated to reflect the committed changes.
bader added a subscriber: bader.Mar 3 2023, 10:01 PM
bader added inline comments.
clang/test/Driver/offload-packager.c
2–3 ↗(On Diff #454808)

Are nvptx and amdgpu target required for this test?
Latest version of the test invokes clang only for x86 target and clang-offload-packager just adds triple as metadata string without using llvm target. Right?

jhuber6 added inline comments.Mar 4 2023, 5:40 AM
clang/test/Driver/offload-packager.c
2–3 ↗(On Diff #454808)

You're right, we could get rid of those. Also this made me realize that the test for the bitcode file isn't actually using bitcode, I should probably fix that.