Page MenuHomePhabricator

[OpenMP] Add unbundling of archives containing bundled object files into device specific archives.
Needs ReviewPublic

Authored by jsjodin on May 29 2020, 9:54 AM.



This patch adds the file type "a", which together with the --unbundle flag, unbundles an archive of bundled object files into an archive containing the device-specific code.
clang-offload-bundler --unbundle --inputs=libMyLib.a -type=a -outputs=libMyDeviceLib.a -targets=openmp-amdgcn-amdhsa-gfx906

Diff Detail

Event Timeline

jsjodin created this revision.May 29 2020, 9:54 AM
jdoerfert retitled this revision from Add unbundling of archives containing bundled object files into device specific archives. to [OpenMP] Add unbundling of archives containing bundled object files into device specific archives..Jun 3 2020, 7:36 AM

We need tests for this.

jdoerfert added a reviewer: dreachem.
jsjodin updated this revision to Diff 268220.Jun 3 2020, 8:41 AM

Full context diff.

yaxunl added inline comments.Jun 3 2020, 8:59 AM

Need to add comments about what is in the input file and by what criteria the files are extracted and what format is the output file, does not contain index, etc.

jsjodin updated this revision to Diff 270201.Jun 11 2020, 12:09 PM

Added tests and a comment to the UnbundleArchive function to describe what it does.

grokos added inline comments.Jun 11 2020, 1:18 PM

Does the K_GNU case cover all other OSes? What about Windows?
(we'd like to support Windows as well, like the SYCL branch does)


Can we add a default case which will return a generic extension (or exit with an "unsupported device" error)? Something like this:

if (Device.contains("gfx"))
  return ".bc";
else if (Device.contains("nvidia-cuda"))
  return ".cubin";

// Maybe also print a warning here
return ".o";

It will help identify things that need to be patched when a new architecture gets support in clang.

jsjodin marked 2 inline comments as done.Jun 11 2020, 5:44 PM
jsjodin added inline comments.

From what I could find out the PE/COFF variant is the same as K_GNU, so it should work.


Yes, I think that would be fine. I was also considering adding a new flag to specify the file extension instead of having this function. We may want to keep this kind of logic in the driver.

jsjodin updated this revision to Diff 271099.Jun 16 2020, 7:53 AM

Made matching for ".cubin" exension explicit, and created a fallback to ".o" with a warning.

jsjodin marked an inline comment as done.Jun 16 2020, 7:54 AM
jsjodin added inline comments.

I opted for now to keep this function and explicitly match for .cubin extension and have a fallback to .o with a warning.

The patch looks good now. I don't have any other comments. We can revisit the driver flag proposal for setting the file extension in the future.


I think a space is missing between archive and members.

Fine from my side. @grokos I guess if you're happy you can accept.

ABataev added inline comments.Jul 1 2020, 9:56 AM

Better to use Triple class for parsing triple rather than invent something new.


No need for else here


What about memory leak here?

keryell added a subscriber: keryell.Jul 6 2020, 9:30 AM

Adding Tyker (Gauthier Harnisch) because of SYCL similarity

jsjodin added inline comments.Aug 4 2020, 3:54 PM

The use of Triple is pretty confusing in this tool, it is actually more than a triple. It is a pair of a llvm::Triple string and a device e.g. "gfx-906" or "sm_61" separated by a "-". I kept the convention of using Triple.

jdoerfert added inline comments.Aug 4 2020, 4:40 PM

I argued that somewhere else before. Are we really sure it makes sense to somehow glue together two things so we don't have to add something new? We have a triple in the module, and if we need a module wide cpu/device identifier, let's add one? I mean, you basically gave the perfect argument of not doing it the way it is implemented: "The use of Triple is pretty confusing in this tool, it is actually more than a triple."
Also, having a global target cpu = "..." in the IR module seems beneficial for other reasons anyway.

jsjodin added inline comments.Aug 4 2020, 5:09 PM

Having a target cpu in the module is probably a good idea, but it wouldn't directly solve this unless the llvm::Triple was modified to include the cpu/device. Together with the "kind" it is used to name the ELF sections for the bundling, so it is really 3 things put together that makes up the whole identifier.

jsjodin updated this revision to Diff 283110.Aug 4 2020, 7:25 PM

Fixed string memory leak, replaced if-then-else with conditional expression. Added check in case there is no device code in an archive child, and we can skip processing it.

jsjodin marked 3 inline comments as done.Aug 4 2020, 7:25 PM
jdoerfert added inline comments.Aug 4 2020, 8:30 PM

Remind me, what is "kind" in this context.
Right, either llvm::Triple needs to hold the cpu/device or we need a new structure on the same level that does. That would match a new top-level module member as well.

jsjodin added inline comments.Aug 5 2020, 5:25 AM

It is the offload kind, which is currently "host", "cuda", "hip" or "openmp".


There's already some outstanding comments - maybe we leave this until you're back from vacation?

sdmitriev added inline comments.Nov 24 2020, 8:13 PM

I guess this patch needs to be clang-formatted.


What is the reason for limiting the number of targets and outputs to be extracted to one?

As @jsjodin is not working on this anymore, I have taken it up in a new patch, D93525 (didn't like the commandeering option).
New clang-formatted patch has a couple of new test cases to elaborate its working further.

All reviewers (and Jan) and subscribers have been copied as such.


To summarize, the target is something like "openmp-amdgcn-amd-amdhsa-gfx906" and getDevice is being used to get the device (gfx906). So, it is a combination of "<kind>-<llvm-target>-<device>".


Done in D93525


For our use case, getting a single device specific archive from this unbundle flag option is sufficient.
Please see new test cases at the end of clang-offload-bundler.c in D93525.

Can you give any use case(s) where we need to generate multiple device specific archives in one run of the tool?

sdmitriev added inline comments.Dec 18 2020, 3:34 AM

I assume in general case unbundling for archives will be done in a similar way as for object files and for objects unbundling is currently done into multiple outputs in one run. There are many example of such use cases for object files in clang/test/Driver/openmp-offload.c. So, it seems natural to have the same functionality for archives as well.

sdmitriev added inline comments.Jan 4 2021, 12:53 AM

We can probably reuse code for unbundling archives from branch. This implementation does not have one target/output limitation and it is based on the FileHandler interface as the rest of the file type handlers. I have prepared a draft for which is based on the intel/llvm code -

Thank you, @sdmitriev. I have updated the patch to handle multiple targets/outputs in one run of the tool ( Please have a look.