This is an archive of the discontinued LLVM Phabricator instance.

[llvm-shlib] Build backend libraries as loadable modules
Needs ReviewPublic

Authored by tstellar on Dec 15 2020, 3:21 PM.

Details

Summary

Do not include the LLVM backend libraries in libLLVM.so, but instead
build them as modules that can be dlopen'd at runtime. This helps keeps
the memory footprint of libLLVM.so smaller, since in most cases there
is only 1 backend in use at any time.

I chose not to add a CMake option to enable/disable this feature,
because I don't want to add another set of configuration permutations
that we need to support.

Binary size comparison:

Without Backend Modules:

117M libLLVM-12git.so

With Backend Modules:

60M libLLVM-12git.so
14M LLVMAMDGPUBackend.so
12M LLVMX86Backend.so
8.5M LLVMAArch64Backend.so
7.2M LLVMARMBackend.so
6.7M LLVMHexagonBackend.so
3.6M LLVMPowerPCBackend.so
3.6M LLVMMipsBackend.so
2.2M LLVMSystemZBackend.so
2.1M LLVMNVPTXBackend.so
2.0M LLVMRISCVBackend.so
1.8M LLVMWebAssemblyBackend.so
881K LLVMSparcBackend.so
732K LLVMBPFBackend.so
730K LLVMAVRBackend.so
721K LLVMLanaiBackend.so
704K LLVMXCoreBackend.so

579K LLVMMSP430Backend.so

126M Total

Diff Detail

Event Timeline

tstellar created this revision.Dec 15 2020, 3:21 PM
tstellar requested review of this revision.Dec 15 2020, 3:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2020, 3:21 PM
llvm/tools/llvm-shlib/libllvm.cpp
21

It's probably safer to just use an `llvm::raw_string_ostream` here, as your code breaks if a Target has a very long name.

40

Not all targets have all these features, so I guess it's fine to ignore the errors when loading fails (?) But some are mandatory, maybe split the mandatory and non-mandatory components?

I love the idea. Many thanks for implementing it!
I think it should go in the release notes.

Do you have some benchmark results for this?
(hyperfine helps to do that https://github.com/sharkdp/hyperfine )

llvm/tools/llvm-shlib/libllvm.cpp
21

Or llvm::format btw

dmajor added a subscriber: dmajor.Dec 16 2020, 9:01 AM
llvm/tools/llvm-shlib/CMakeLists.txt
34

I am somewhat surprised this works -- I usually try to avoid this kind of layering issue, because while it can work, it usually has weird edge cases.

In this case, each of your target backends will directly embed (and export!) its dependencies, which, are probably also embedded and exported by the libLLVM that loads it. Aside from bloat, this can cause issues with module statics being duplicated depending on how the child library references things in the module (tends to manifest with cl conflicts in the support library at runtime). Based on the sizes you report, I suspect this is adding ~a megabyte to each of your backends (completely guessing based on what I have seen).

There are two ways to later this better:

  • arrange for the dependencies to always exist in libLLVM and then just don't include them here. Because you are dlopen'ing from there, this should work on Linux, even without an ld-level link. Might not work on osx. Can't work on Windows.
  • my rework of the cmake bits for libLLVM layers this properly but will take some time/consensus. I feel like that is the better approach, but imo, of this works for the cases your targeting and stems some issues, I'm not opposed.
36

Is there a reason why we are using the "backend" nomenclature here for each target? I had looked into doing this with my cmake upgrades and had been thinking that a name like LLVMX86Target was more consistent.

llvm/tools/llvm-shlib/libllvm.cpp
21

I don't think you want to be depending on anything from support in this kind of shim loader, so I would stick with libc string manipulation, but +1 on making it safer.

tstellar added inline comments.Jan 4 2021, 11:10 AM
llvm/tools/llvm-shlib/CMakeLists.txt
34

There are two ways to later this better:

  • arrange for the dependencies to always exist in libLLVM and then just don't include them here. Because you are dlopen'ing from there, this should work on Linux, even without an ld-level link. Might not work on osx. Can't work on Windows.
  • my rework of the cmake bits for libLLVM layers this properly but will take some time/consensus. I feel like that is the better approach, but imo, of this works for the cases your targeting and stems some issues, I'm not opposed.

If you have a better solution that is great. Right now, I'm not in a rush to get this in.

36

LLVMX86Target is fine with me.

llvm/tools/llvm-shlib/CMakeLists.txt
34

I don't want to hold up progress, but I do think that the approach I'm advocating in this RFC will get us the library organization we want, will work across platforms/downstreams/etc: http://lists.llvm.org/pipermail/llvm-dev/2021-January/147567.html

Modulo the name of the target .so file, what I prototyped there already breaks the libraries up in roughly this way. Actually wiring it up for dynamically loading the target libraries vs hard linking to them would be similar to what you have here (the loader cpp file would be almost the same), but we would likely need to put it somewhere else (into its own component). If we go forward with my RFC, I can just design that in as an incremental thing.

There are some risks with my approach, mainly around timing and my availability to finish it. Given the number of moving pieces and age of the code, I could easily see the transition taking weeks to a small number of months to land (it isn't that much work but needs to be sequenced carefully/thoughtfully). Like I said, I don't want to block progress, but it would be somewhat easier to add this feature to what I'm doing than to land this patch first, reverse it and re-implement.

tstellar marked an inline comment as done.Jul 27 2021, 3:53 PM
tstellar added inline comments.
llvm/tools/llvm-shlib/libllvm.cpp
21

Maybe we should add a test to check for a long target name? Otherwise I'm not sure exactly how to improve this.

40

So do you think we should print an error message if a mandatory feature fails to load but do nothing if a non-mandatory feature fails?

beanz added a subscriber: bogner.

+@bogner

Sorry for being late to the party. I think this very much needs to be under an option.

On platforms that use this with clang it will have an adverse impact on compile time, because it will increase the number of C++ symbols that need to be resolved (normally backend symbols are internally resolved). I'm really curious what the performance impact is, I remember a number of years ago benchmarking that using libLLVM on macOS had something like a 10% performance regression over static linking.

This also will have significant adverse impact on the initial use case that I added this support for, GPU backends. In GPU drivers the compiler usually comes bundled with a small number of backends, and the cost of dlopening would be extremely detrimental because the size of programs is so small that loading the compiler needs to be as fast as possible.

+@bogner

Sorry for being late to the party. I think this very much needs to be under an option.

On platforms that use this with clang it will have an adverse impact on compile time, because it will increase the number of C++ symbols that need to be resolved (normally backend symbols are internally resolved). I'm really curious what the performance impact is, I remember a number of years ago benchmarking that using libLLVM on macOS had something like a 10% performance regression over static linking.

This also will have significant adverse impact on the initial use case that I added this support for, GPU backends. In GPU drivers the compiler usually comes bundled with a small number of backends, and the cost of dlopening would be extremely detrimental because the size of programs is so small that loading the compiler needs to be as fast as possible.

So you think even for users that already link against libLLVM.so this will perform significantly worse?

beanz added a comment.Jul 28 2021, 7:39 AM

So you think even for users that already link against libLLVM.so this will perform significantly worse?

Yes, and since the performance cost is static and at load time, it will add a roughly fixed amount of time to all compiler process spawns, which will disproportionately impact the performance of shorter running compiles (like GPU shaders).

Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2022, 10:40 AM
smeenai resigned from this revision.Apr 28 2022, 2:04 PM