Add GNU Static Lib Tool, which supports the --emit-static-lib
flag. For HIP, a static library archive will be created and
consist of HIP Fat Binary host object with the device images embedded.
Using MC Directives, embed the device images and define symbols.
Using llvm-ar to create the static archive. Also, delete existing
output file to ensure a new archive is created each time.
Details
Diff Detail
Event Timeline
| clang/lib/Driver/ToolChain.cpp | ||
|---|---|---|
| 579 | What does ar do with the symbol table for this? I'd expect it to include host symbols but none of the device ones | |
| clang/lib/Driver/ToolChain.cpp | ||
|---|---|---|
| 579 | Also, can we use llvm-ar? It should reduce the amount of surprises we may run into with whatever ar we may find in the PATH. | |
| clang/lib/Driver/ToolChain.cpp | ||
|---|---|---|
| 579 | Hi Jon, the ar tool will embed the host objects, .o, and the device fat binary object, .hipfbo, into a global symbol __hip_fatbin. HIP runtime has the capability to extract this symbol, and digest the HIP fat binary, and extract the device symbols to run. Hi tra, I will update this to use llvm-ar, thanks. | |
Added GNU Static Lib Tool, which supports the --hip-emit-static-lib flag with -fgpu-rdc. A static library archive will be created and include both the host object and device object embedded.
What's going to be in that static library in the end? The description above is somewhat confusing. On one hand it says include both the host object and device object embedded.. Does it mean host object and device object to be embeddeed? I.e. two separate object files? Or does it mean that the device object is embedded in the host object and the host object is included in the library?
Using llvm-mc to embed the HIP fat binary into __hip_fatbin symbol, and system ar to archive. Also, delete existing output file to ensure a new archive with index is created each time.
__hip_fatbin, presumably in the host object. If the device object is embedded in the host object, why do we need to put it into the library? In the end theuser would only see the host object and we may just link it directly without making it a library. What do I miss?
It would be great to add some details to the test, because right now it's really hard to figure out what's being done and almost impossible to understand why it is done that way.
The test example is also odd in a sense that it compiles two source files. Is the purpose of this patch to allow creation of a self-contained archive with GPU-side objects that still need final linking step further down the build pipeline?
If that's indeed the driving motivation behind the patch, it may need some improvement, as it's unlikely to work well in practice, where pretty much every build system compiles one file at a time. Giving single compilation large number of inputs is simply not going to scale in practice. We need to figure out a way to implement the final link for object files compiled with -fgpu-rdc in separate compilations.
The purpose of this patch is to generate a static library that can be used as an "ordinary" static library, i.e., can be linked by a system linker, given a bunch of HIP programs that need -fgpu-rdc to compile.
This is a common use case for a library written with HIP. That is, the library provides a bunch of host APIs so that other C++ programs can use it. The library does not provide device code that need to be linked with the user's device code. The users want to use it as an ordinary host library.
If the users of the library just need dynamic library, then they do not need this feature, since clang is already able to do that.
However, clang is not able to generate a static library for that purpose. This is what this feature is trying to achieve.
So basically this patch will link the device code in these HIP programs, generate device ISA, and create a host object containing the ISA, and add the host object to the generated static library. This is exactly what clang does for -fgpu-rdc when generating a shared library. The only difference is that, instead of generating a shared library, now clang generates a static library.
Is the intention that all GPU-side code inside this library is linked together, even though individual sources are compiled to object files?
Is there a way to compile .hip ->.o with GPU-side object and then
link together GPU objects from multiple such .o files into a single .o with partially linked host object and fully linked GPU code?
If the users of the library just need dynamic library, then they do not need this feature, since clang is already able to do that.
Can you give me an example how this shared library is produced?
However, clang is not able to generate a static library for that purpose. This is what this feature is trying to achieve.
So basically this patch will link the device code in these HIP programs, generate device ISA, and create a host object containing the ISA, and add the host object to the generated static library. This is exactly what clang does for -fgpu-rdc when generating a shared library. The only difference is that, instead of generating a shared library, now clang generates a static library.
Can you give me a small example of using clang to generate such .so ? If this approach relies on compiling multiple *source* files at once, that will have practicall use issues. If it can be done on .o as inputs, then it may be OK.
When clang detects the .o files are clang-offload-bundle or host object, it will extract the bitcode and link them and generate ISA, then create a host object containing it, then link them together. We can link multiple objects together to create either a shared library or a static library (with this patch).
If the users of the library just need dynamic library, then they do not need this feature, since clang is already able to do that.
Can you give me an example how this shared library is produced?
In a ROCm project, RCCL is building the shared library using -fgpu-rdc. That project compiles sources using -c and -fgpu-rdc, and then links them together into a shared library. We require this patch to allow them to produce a static library too.
However, clang is not able to generate a static library for that purpose. This is what this feature is trying to achieve.
So basically this patch will link the device code in these HIP programs, generate device ISA, and create a host object containing the ISA, and add the host object to the generated static library. This is exactly what clang does for -fgpu-rdc when generating a shared library. The only difference is that, instead of generating a shared library, now clang generates a static library.
Can you give me a small example of using clang to generate such .so ? If this approach relies on compiling multiple *source* files at once, that will have practicall use issues. If it can be done on .o as inputs, then it may be OK.
Yes, this can be done on .o files. If you have multiple hip objects, foo.o and bar.o with their own device code, we can pass them into clang to generate a shared library file. clang currently extracts their bitcode, and links them together, and creates host objects containing the device code, and will link them together. It can accept multiple .o as inputs. With this patch, using the flag hip-emit-static-lib, we can generate a .a static library file.
The semantics of this patch are a bit unclear to me.
When linking a static archive, the linker searches for each currently unresolved symbol in the archive and links in only the object that exposes that symbol. Unused objects are left in the archive. That's useful behaviour in various settings, and overridden with --whole-archive when all the objects are wanted.
The behaviour I would expect, from a user perspective, is to link in the host code which satisfies unresolved symbols for the host, and the device code that satisfies unresolved symbols for the device, and no extra files, regardless of whether the files are machine code for either arch, bitcode for either arch, or an offload-bundle potentially containing some of each. That's what we have today for mixtures of bitcode and machine code for a single architecture. And extensible in the obvious fashion for > 2 architectures.
That seems implementable - skip over objects for architectures other than the one currently of interest, and reach into clang-offload-bundle if present to do the same thing. Optionally provide a symbol table per arch to make it faster.
I think you're suggesting that the bitcode from every object is extracted and linked together, not just the ones that are actually used. If so, do you also change the semantics for the host code part of the object? What about for host objects that don't contain any bitcode? What about bitcode files for the device that don't have a corresponding host part?
| clang/lib/Driver/ToolChain.cpp | ||
|---|---|---|
| 579 | I'm not really following you here. Longer comment at the top level. It seems a shame to spawn llvm-ar to make the archive. It's not a very complicated file format and we already have the implementation of llvm-ar in tree. Can we use it as a library instead of forking? | |
The use case of this patch is that the library only needs to provide host APIs which can be called by C++ programs but do not need to expose device code to the users. Therefore the device code in the library are linked, finalized, and embedded as ISA in host objects and not exposed to library users. We have user requests to support such use case.
There may be use cases that users want to store both host codes and device codes in a static library and expose both to users, however that is out of scope of this patch.
| clang/lib/Driver/ToolChain.cpp | ||
|---|---|---|
| 579 | Even if we can do that in-process, we still need the action and tool to represent that compilation step. Besides, llvm-ar does not support static library used by MSVC. If we want to extend this to MSVC toolchain, we need these API's. | |
Ah, OK. So it's HIP specific, and not a general purpose static archive. In that case, the many functions with names based on StaticLib should probably all be renamed HIPStaticLib or similar to indicate that it doesn't work for other toolchains. Can any of this code be moved into the HIP.cpp toolchain file?
It is not HIP specific. The use case is generic and common. Many cases users do not care about device code internals and only interested in using the functionality of a library through usual host APIs. If other toolchains would like to support the same use case they can easily extend it to their own use.
Also the implementation requires introduction of static lib tool in general since to get a tool it has to go through the generic interface of toolchain. There have already been all kinds of Tools and JobActions in the generic interface that are for specific toolchains, e.g. clang-offload-bundler, so this is not something new.
Regardless of how popular the use case might be, it's not building a static library. It uses the same file format, which seems adequate for the purpose, but with different semantics.
Can you split this patch into a function for building static libraries, which other toolchains could use unmodified, and some HIP specific code which makes use of that function?
No objection to another tool. The objection is to calling it 'static lib', on the basis that it isn't one, and other toolchains may want to build a conventional static library from the clang driver.
The current patch already does that. For other toolchains, this option will generate a normal static library, like ar does.
For HIP, it does something extra (link, finalize and embeds device code), then generate a static library for host object like ar does.
Maybe Aaron can add a lit test for using this option for C language.
This patch follows a generic design. It introduces generic option, JobAction, and Tool about creating static library which can be customized for each toolchain.
For linux toolchain, it is just the conventional ar, which is implemented in this patch.
For HIP toolchain, currently it implements a specific behavior to finalize device code and generate a common host static library. It is a static library from the host point of view.
This does not preventing it to be extended to generating other format of static libraries. We can easily introduce some options to generate static library in a different way, e.g. archive of clang-offload-bundler bundles, or clang-offload-bundler bundles of archives.
For other toolchains, they just need to override virtual functions getStaticLibTool and implement their own static lib tool to implement their own command to create static libs.
Some comments inline in StaticLibTool::ConstructJob
Unrelated to the above, but a consequence of reading the code a few times, there's a lot of duplication with tools::AddHIPLinkerScript that doesn't seem necessary.
| clang/lib/Driver/ToolChains/CommonArgs.cpp | ||
|---|---|---|
| 1427 | This code looked familiar - it seems to have a lot in common with AddHIPLinkerScript just above. Can the common parts be factored out? | |
| 1486 | There isn't a linker script involved here. Copied & pasted from above? | |
| clang/lib/Driver/ToolChains/Gnu.cpp | ||
| 350 | This function is named (and starts off) generic | |
| 373 | Then does some things which aren't obviously generic. What's special about -stdlib here? | |
| 392 | Finally does something that is definitely hip specific | |
Maybe Aaron can add a lit test for using this option for C language.
I can add a lit test for C language.
| clang/lib/Driver/ToolChains/CommonArgs.cpp | ||
|---|---|---|
| 1427 | Could re-factoring be in a separate commit? It may be significant and risky. I think there can be other optimizations made here. | |
| 1486 | Okay I will fix these comments, the comment can be file generator script. | |
| clang/lib/Driver/ToolChains/Gnu.cpp | ||
| 350 | These two lines were added for the D.Diag error checking below. | |
| 373 | This is a generic line from the Linker::ConstructJob function | |
| 392 | There is a check in this function which will skip this function if not HIP host toolchain above. // If this is not a HIP host toolchain, we don't need to do anything. if (!JA.isHostOffloading(Action::OFK_HIP)) return; | |
Few cosmetic nits. LGTM in general. I'll leave the approval to @JonChesterfield
| clang/lib/Driver/ToolChains/CommonArgs.cpp | ||
|---|---|---|
| 1417–1420 | You don't need to static_cast in order to do getTriple(), regular ToolChain should do the job. | |
| 1466 | Nit: Would it make sense to this if with the one above, so the file name construction is in one place for both names? | |
| 1473 | Nit: It's only used once. Fold it into MakeArgString()? | |
| clang/lib/Driver/ToolChains/Gnu.cpp | ||
| 348–349 | Nit: the cast may be unnecessary. Both getDriver() and AddLinkerInput() operate on base ToolChain. | |
| 379 | This is odd. IMO, if we report an error it should be due to an error. In this case we produce the diagnostic for something we didn't even try. Perhaps just remove these predictive checks altogether and rely on the error returned by fs::remove() below? | |
I don't like the copy and paste but at least it's contained within HIP specific functions.
The control flow in tools::gnutools::StaticLibTool::ConstructJob doesn't seem good though. It's a generic function that unconditionally calls a hip specific function which happens to return immediately in non-hip cases. That really should be a hip specific function calling the generic one, then doing more hip specific things afterwards.
However, @tra knows the clang code base and conventions rather better than I do, so I'm going to defer back.
+1. Indeed it would be great to reuse existing tool either by making it more flexible, or via extracting common things into a helper.
However, @tra knows the clang code base and conventions rather better than I do, so I'm going to defer back.
I may be too used to the odd things we have to do to deal with GPUs. I've pinged @echristo for the second opinion.
Thank you for your reviews, I've cleaned up the patch, and reduced the size of the StaticLib::ConstructJob.
There might require a lot of changes to make this more flexible, and making a hip specific function call generic functions. I think it may be out of the scope for this patch.
I think I've got a lot of the same concerns that Jon had. In addition I'd like to see some reasoning and abstraction if we're going to use this for creating archives in general. No need to make windows work, but instead could just return a "we can't do that yet" error instead?
| clang/lib/Driver/Driver.cpp | ||
|---|---|---|
| 3502–3506 | I'm really not a fan of this here. I think this needs to be abstracted a bit. | |
| clang/lib/Driver/ToolChains/CommonArgs.cpp | ||
| 1392 | This needs comments. | |
| 1416 | I'd really like to avoid a target specific assert here. | |
| 1427 | You can refactor ahead of time to make this a bit more clear? | |
Hi Eric, this patch is to support users who want to create a static library on the HIP language and ROCm architecture. I will add some comments to the code to make it more clear.
| clang/lib/Driver/Driver.cpp | ||
|---|---|---|
| 3502–3506 | Could you please provide some advice on how to make this more abstract? I am not familiar with this area of the code. | |
I've refactored the code, and removed the AddHIPLinkerScript function, which can be replaced by these MC directives approach.
| clang/lib/Driver/Driver.cpp | ||
|---|---|---|
| 3502–3506 | You may introduce a member function Driver::shouldEmitStaticLibrary() and probably change -hip-emit-static-lib to -emit-static-lib | |
Thank you for the tips, I've updated the code with Driver::ShouldEmitStaticLibrary function, and renamed to flag to a more general name, --emit-static-lib.
@echristo Aaron has addressed most of the comments. The major remaining issue is that the general linker
calls AddGenerateObjFileFromHIPFatBinary, which is supposed to be part of HIP
toolchain. This issue is not new, since the existing gnutools::Linker::ConstructJob
already did the same thing. Is it OK to land this patch and leave the refactoring of
AddGenerateObjFileFromHIPFatBinary to a separate patch? Thanks.
Hi @JonChesterfield and @echristo , I am opening a follow up commit that will move the linking logic into the HIP Toolchain rather than being in GNU.cpp. I will rebase this on top of master as well.
Removed the HIP Linker changes, that will come in a different patch. Keeping this change more general.
This appears to have been committed without addressing all the comments or waiting for an acceptance from someone outside of our organisation. That doesn't seem right - am I missing part of the thread here?
Hi Jon, I may have committed this in combination with D81963 as the parent commit. It should have waited for a LGTM from someone outside our org.
Please let me know unaddressed comments on this change, and I will fix them asap. Sorry about the confusion.
I'm really not a fan of this here. I think this needs to be abstracted a bit.