Page MenuHomePhabricator

[OpenMP] Add unbundling of archives containing bundled object files into device specific archives.
Needs ReviewPublic

Authored by jsjodin on May 29 2020, 9:54 AM.

Details

Summary

This patch adds the file type "a", which together with the --unbundle flag, unbundles an archive of bundled object files into an archive containing the device-specific code.
Example:
clang-offload-bundler --unbundle --inputs=libMyLib.a -type=a -outputs=libMyDeviceLib.a -targets=openmp-amdgcn-amdhsa-gfx906

Diff Detail

Event Timeline

jsjodin created this revision.May 29 2020, 9:54 AM
jdoerfert retitled this revision from Add unbundling of archives containing bundled object files into device specific archives. to [OpenMP] Add unbundling of archives containing bundled object files into device specific archives..Jun 3 2020, 7:36 AM

We need tests for this.

jdoerfert added a reviewer: dreachem.
jsjodin updated this revision to Diff 268220.Jun 3 2020, 8:41 AM

Full context diff.

yaxunl added inline comments.Jun 3 2020, 8:59 AM
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
965

Need to add comments about what is in the input file and by what criteria the files are extracted and what format is the output file, does not contain index, etc.

jsjodin updated this revision to Diff 270201.Jun 11 2020, 12:09 PM

Added tests and a comment to the UnbundleArchive function to describe what it does.

grokos added inline comments.Jun 11 2020, 1:18 PM
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
927–931

Does the K_GNU case cover all other OSes? What about Windows?
(we'd like to support Windows as well, like the SYCL branch does)

933–938

Can we add a default case which will return a generic extension (or exit with an "unsupported device" error)? Something like this:

if (Device.contains("gfx"))
  return ".bc";
else if (Device.contains("nvidia-cuda"))
  return ".cubin";

// Maybe also print a warning here
return ".o";

It will help identify things that need to be patched when a new architecture gets support in clang.

jsjodin marked 2 inline comments as done.Jun 11 2020, 5:44 PM
jsjodin added inline comments.
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
927–931

From what I could find out the PE/COFF variant is the same as K_GNU, so it should work.

933–938

Yes, I think that would be fine. I was also considering adding a new flag to specify the file extension instead of having this function. We may want to keep this kind of logic in the driver.

jsjodin updated this revision to Diff 271099.Jun 16 2020, 7:53 AM

Made matching for ".cubin" exension explicit, and created a fallback to ".o" with a warning.

jsjodin marked an inline comment as done.Jun 16 2020, 7:54 AM
jsjodin added inline comments.
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
933–938

I opted for now to keep this function and explicitly match for .cubin extension and have a fallback to .o with a warning.

The patch looks good now. I don't have any other comments. We can revisit the driver flag proposal for setting the file extension in the future.

clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
939–940

I think a space is missing between archive and members.

Fine from my side. @grokos I guess if you're happy you can accept.

ABataev added inline comments.Jul 1 2020, 9:56 AM
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
132–140

Better to use Triple class for parsing triple rather than invent something new.

938–943

No need for else here

1044–1045

What about memory leak here?

keryell added a subscriber: keryell.Jul 6 2020, 9:30 AM

Adding Tyker (Gauthier Harnisch) because of SYCL similarity

jsjodin added inline comments.Aug 4 2020, 3:54 PM
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
132–140

The use of Triple is pretty confusing in this tool, it is actually more than a triple. It is a pair of a llvm::Triple string and a device e.g. "gfx-906" or "sm_61" separated by a "-". I kept the convention of using Triple.

jdoerfert added inline comments.Aug 4 2020, 4:40 PM
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
132–140

I argued that somewhere else before. Are we really sure it makes sense to somehow glue together two things so we don't have to add something new? We have a triple in the module, and if we need a module wide cpu/device identifier, let's add one? I mean, you basically gave the perfect argument of not doing it the way it is implemented: "The use of Triple is pretty confusing in this tool, it is actually more than a triple."
Also, having a global target cpu = "..." in the IR module seems beneficial for other reasons anyway.

jsjodin added inline comments.Aug 4 2020, 5:09 PM
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
132–140

Having a target cpu in the module is probably a good idea, but it wouldn't directly solve this unless the llvm::Triple was modified to include the cpu/device. Together with the "kind" it is used to name the ELF sections for the bundling, so it is really 3 things put together that makes up the whole identifier.

jsjodin updated this revision to Diff 283110.Aug 4 2020, 7:25 PM

Fixed string memory leak, replaced if-then-else with conditional expression. Added check in case there is no device code in an archive child, and we can skip processing it.

jsjodin marked 3 inline comments as done.Aug 4 2020, 7:25 PM
jdoerfert added inline comments.Aug 4 2020, 8:30 PM
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
132–140

Remind me, what is "kind" in this context.
Right, either llvm::Triple needs to hold the cpu/device or we need a new structure on the same level that does. That would match a new top-level module member as well.

jsjodin added inline comments.Aug 5 2020, 5:25 AM
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
132–140

It is the offload kind, which is currently "host", "cuda", "hip" or "openmp".

Ping.

There's already some outstanding comments - maybe we leave this until you're back from vacation?

sdmitriev added inline comments.Nov 24 2020, 8:13 PM
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
1063

I guess this patch needs to be clang-formatted.

1105–1106

What is the reason for limiting the number of targets and outputs to be extracted to one?

As @jsjodin is not working on this anymore, I have taken it up in a new patch, D93525 (didn't like the commandeering option).
New clang-formatted patch has a couple of new test cases to elaborate its working further.

All reviewers (and Jan) and subscribers have been copied as such.

clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
132–140

To summarize, the target is something like "openmp-amdgcn-amd-amdhsa-gfx906" and getDevice is being used to get the device (gfx906). So, it is a combination of "<kind>-<llvm-target>-<device>".

1063

Done in D93525

1105–1106

For our use case, getting a single device specific archive from this unbundle flag option is sufficient.
Please see new test cases at the end of clang-offload-bundler.c in D93525.

Can you give any use case(s) where we need to generate multiple device specific archives in one run of the tool?

sdmitriev added inline comments.Dec 18 2020, 3:34 AM
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
1105–1106

I assume in general case unbundling for archives will be done in a similar way as for object files and for objects unbundling is currently done into multiple outputs in one run. There are many example of such use cases for object files in clang/test/Driver/openmp-offload.c. So, it seems natural to have the same functionality for archives as well.

sdmitriev added inline comments.Jan 4 2021, 12:53 AM
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
1105–1106

We can probably reuse code for unbundling archives from https://github.com/intel/llvm branch. This implementation does not have one target/output limitation and it is based on the FileHandler interface as the rest of the file type handlers. I have prepared a draft for llvm.org which is based on the intel/llvm code - https://reviews.llvm.org/D94005

Thank you, @sdmitriev. I have updated the patch to handle multiple targets/outputs in one run of the tool (https://reviews.llvm.org/D93525?id=314633). Please have a look.