This is an archive of the discontinued LLVM Phabricator instance.

[WIP][llvm] LLVM Busybox Prototype
Changes PlannedPublic

Authored by leonardchan on Jun 21 2021, 9:03 PM.

Details

Summary

This is a rough prototype of the tool described in https://lists.llvm.org/pipermail/llvm-dev/2021-June/151321.html. The tl;dr is that busybox will attempt to merge all llvm tools into one single binary rather than separate binaries. This will primarily be useful for toolchains that distribute a suite of llvm tools. This prototype only has porting for llvm-objdump and llvm-objcopy, but this technique can be applied to any combination of llvm tools.

Design:

  • Individual llvm tools are "librarified" into static libraries that get linked together in the busybox tool, which itself is just a tool that just dispatches to the appropriate "main" function of a specific tool.
  • The busybox binary is just called "llvm", but other llvm tools can be symlinked to it (ie. ln -s llvm-objdump llvm) and busybox should dispatch to the appropriate tool (by checking argv[0]).

Usage:

  • Various llvm tools can be invoked via busybox by passing a shortened tool name as its first argument (llvm obdump [objdump_args]).
  • Symlinked tools to busybox should "just work" out of the box with no implementation differences.
    • This is accounted for and all LLVM tests currently pass with busybox enabled.
  • Busybox is enabled via the cmake flag LLVM_ENABLE_EXPERIMENTAL_BUSYBOX which is OFF by default.

Mesaurements:
Each of these binary measurements are for the stripped + release build versions, with --gc-sections enabled.

  • The busybox binary is ~20.5 MB.
  • The statically compiled llvm-objdump and llvm-objcopy are ~20 MB and ~4 MB respectively (~24 MB combined).
    • Size savings are likely from deduped symbols that were statically linked into the final binary.
  • The dynamically compiled llvm-objdump and llvm-objcopy that depend on libLLVM are ~695 KB and ~563 KB respectively. libLLVM is ~90 MB. The combined size of these is ~92 MB.
    • Size savings in this case are likely because --gc-sections removes a large chunk of libLLVM that would've been statically linked into the statically compiled binaries.

Implementation issues:

  • Ideally we wouldn't need to have so many cmake changes, but I might not have enough cmake mastery to reduce the complexity.
    • One noticeable issue is that cmake code for creating symlinks from the original binary is duplicated. See the inlined cmake comment below for an explanation of this.
  • Each tool seems to have different runtime configurations depending on what the symlinked tool is. These are all controlled via a similarly operating Is function that I just copied into the busybox implementation.

Adding a tool to the busybox (ideally we would have as few steps as possible):

In the tool's directory:

  1. Inside the tool's CMakeFiles.txt, "librarify" the llvm binary by checking to see if the busybox cmake flag is ON. This can be done by abstracting out the source files/arguments and passing them to a similarly named LLVM{ToolName} via add_llvm_library. See how this patch does it for llvm-objcopy/dump.
    • Note that symlinks should not be added within the tool's CMakeFiles.txt if busybox is enabled since the tool target hasn't been made yet. Those will be added to busybox's CMakeFiles.txt.
  2. Add a macro check around the tool's main function to call an externally available main-like function. Something like:
+#ifdef LLVM_ENABLE_EXPERIMENTAL_BUSYBOX
+int llvm_objdump_main(int argc, char **argv) {
+#else
 int main(int argc, char **argv) {
+#endif
`

In busybox's directory:

  1. Inside busybox's CMakeFiles.txt, add the newly created LLVM{ToolName} library as a dependency to the llvm tool and add {ToolName} to LLVM_LINK_COMPONENTS.
  2. Add any symlinks that would've pointed to the original tool to instead point to the busybox tool (llvm) at the end of CMakeFiles.txt. All symlinks will now point to the busybox tool.
    • Note that while it is possible to have symlinks point to other symlinks, I ran into some cmake errors when doing this. See the long FIXME comment in llvm-busybox/CMakeLists.txt.
  3. In llvm/tools/llvm-busybox/Tools.def, add the appropriate TOOL macros for the various tools that busybox should dispatch to.
    • BusyboxName is the name tool name that would be passed as the first argument to the llvm binary and what busybox compares against when attempting to dispatch.
    • LLVMName is the name of the original tool that would've been created (or symlinked) without busybox.
    • MainFunc is the main-like function added in step 2.

Followups:

  • The cmake machinery for creating "install" symlinks to an installed version of busybox have not been implemented yet. Under busybox, the only stripped tool we'll need to make is the busybox tool and all other "stripped" tools should just be symlinks to this stripped busybox.
  • Find a way to remove duplicate cmake code in busybox and the original tool directory.
  • Figure out windows support where symlinks are mostly limited to administrators and developer mode. (Maybe we might just not support windows for now?)

Diff Detail

Event Timeline

leonardchan created this revision.Jun 21 2021, 9:03 PM
leonardchan requested review of this revision.Jun 21 2021, 9:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2021, 9:03 PM
leonardchan planned changes to this revision.Jun 21 2021, 9:04 PM
phosek added inline comments.Jun 22 2021, 3:05 PM
llvm/CMakeLists.txt
434

I think we should avoid the term busybox to avoid the confusion with https://busybox.net/, instead we should probably use the term multiplexing (or muxing for short) which has been also used by https://wdtz.org/files/oopsla18-allmux-dietz.pdf.

We could name the option as LLVM_ENABLE_EXPERIMENTAL_MULTIPLEXING (or LLVM_ENABLE_EXPERIMENTAL_MUXING) and name the tool as llvm-mux.

llvm/tools/llvm-busybox/Tools.def
5–13

I'd like to avoid centralized registry which I think would be difficult to scale and impossible to use for out-of-tree tools.

Instead, I'd be more interested in following an architecture similar to what we use for passes, so having ToolRegistry (akin to PassRegistry) with a single global instance that you could register each tool with similar to how RegisterPass is used.

We could consider requiring a class for each tool which would wrap the state (like the command line arguments) and then use it like:

static RegisterTool<ObjCopyTool> OT("objcopy", "Copies the contents of an object file to another");
15

Rather than duplicating the name, could we just construct BusyboxName from LLVMName by stripping the llvm- prefix if it exists?

llvm/tools/llvm-busybox/main.cpp
135

Does this need to be heap allocated if it doesn't outlive this function? Could this just be a SmallString?

llvm/tools/llvm-objcopy/llvm-objcopy.cpp
414–418

We could avoid the ifdef just by providing a macro that expands appropriately depending on the mode, something like:

LLVM_TOOL_MAIN(objcopy, int argc, char **argv)
pcc added a subscriber: pcc.Jun 22 2021, 5:35 PM
pcc added inline comments.
llvm/CMakeLists.txt
434

Agreed that it shouldn't be named busybox. My first thought was that this was an LLVM licensed reimplementation of coreutils, which seems a little out of scope for a compiler project.

Maybe the tool (or at least the binary) should simply be named llvm? I can't think of any better reason why we might want to name a tool llvm, and it would allow convenient use of the multiplexer without symlinks without additional typing.

pcc added inline comments.Jun 22 2021, 5:38 PM
llvm/tools/llvm-busybox/CMakeLists.txt
11

Oh and now I see that's what it's already called. Never mind then.

aganea added a subscriber: aganea.Jun 28 2021, 2:34 PM

I think this will reduce compile time of llvm a lot, currently llvm needs a lot of time to do link optimization

Herald added a project: Restricted Project. · View Herald TranscriptMay 13 2022, 12:46 PM
Herald added a subscriber: StephenFan. · View Herald Transcript