This is an archive of the discontinued LLVM Phabricator instance.

[tools]Introduce llvm-lipo
ClosedPublic

Authored by alexander-shaposhnikov on May 14 2019, 6:29 PM.

Details

Summary

This diff starts the implementation of llvm-lipo which is supposed to be a drop-in replacement for the well-known tool lipo.
Lipo manipulates fat binaries, in particular, it can create a "fat" binary, extract regular object files from a fat binary,
report some useful information about the input.

In this diff to get things started we implement the option -verify_arch.
Next we are planning to contribute the implementation of

"-archs": (along the lines of the current diff).

"-thin": (takes a fat binary and writes the "slice" for a particular architecture into a separate file "as is". libObject's functionality is enough for it.

"-create": (takes multiple regular mach-o object files for different architectures and generates a "fat" binary, we are planning to use the standard libObject functionality for reading/inspecting/validating the input files,
populate MachO::fat_header and MachO::fat_arch structures and write them and the slices "as is".

Test plan:
make check-all

Diff Detail

Repository
rL LLVM

Event Timeline

alexander-shaposhnikov created this object with visibility "All Users".
Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2019, 6:29 PM
Herald added a subscriber: mgorny. · View Herald Transcript

I hate to be "That Guy" but this really needs a docs/CommandGuide file.

I'm super-excited to see this work. This is shall we say "a really bad time of year" to get a thoughtful review from me ( for some reason :p ) but I'll move this to the top of my pile, circumstances permitting.

alexander-shaposhnikov edited the summary of this revision. (Show Details)May 14 2019, 10:59 PM

Looks good, though I'm not super familiar with the libObject APIs (yet) :)

test/tools/llvm-lipo/verify-arch-macho-binary.test
3 ↗(On Diff #199542)

lipo only supports this form of specifying flags (single dash, no equals). Do we want to support the others?

22 ↗(On Diff #199542)
test/tools/llvm-lipo/verify-arch-universal-binary.test
10 ↗(On Diff #199542)

Same comment about minimizing this as much as possible.

tools/llvm-lipo/llvm-lipo.cpp
33 ↗(On Diff #199542)

Shouldn't this be OneOrMore?

74 ↗(On Diff #199542)

Would it be better to use a SmallVectorImpl as the parameter type?

81 ↗(On Diff #199542)

Super nit: I think this would read more naturally if the left and right hand sides of the == were switched.

82 ↗(On Diff #199542)

Do we need the explicit Twine constructor here? I thought there was an operator+ for const char * and StringRef.

95 ↗(On Diff #199542)

Triple(Arch).getArch() is used a bunch. Should we save it to a variable?

alexander-shaposhnikov planned changes to this revision.May 14 2019, 11:47 PM

will update this diff tomorrow

A second thought on the topic of lipo compatibility. libObject is probably reasonable if your primary interest is MH_OBJECT files. There exist a number of final-linked-images that libObject's fat parser will not currently understand; these are formats that cannot be produced by llvm tools today. I wonder if choosing a different name for the tool might give you some flexibility to "do things differently" than lipo. That said, I don't have a better name in mind. :)

test/tools/llvm-lipo/verify-arch-macho-binary.test
3 ↗(On Diff #199542)

I don't see a problem with supporting "--" options here, from the point of view of lipo compatibility. I say, do it.

There are other ways where llvm-lipo is going to be an 'awkward fit' as a drop-in replacement for lipo. For example, lipo is a little unusual in that when -verify_arch is specified, all the remaining options are interpreted as architectures. So llvm-lipo interprets "-verify_arch x86_64 /bin/ls" as "verify /bin/ls has an arch for x86_64 in it" but lipo itself says [paraphrasing] "/bin/ls is not an architecture." Either you will type "input_file -verify_arch x86_64 i386" or you will type "-verify_arch x86_64 -verify_arch i386 input_file" (where input_file could appear anywhere), and whichever usage you specify will be specific either to "llvm-lipo" or the historical lipo.

Pragmatically I think you're going to want to get as close as you can, to avoid confusion. But recognize there are going to be differences if you want to use llvm's argument parsing.

tools/llvm-lipo/llvm-lipo.cpp
33 ↗(On Diff #199542)

No. -verify_arch is one of several mutually-exclusive lipo commands. You cannot specify both -verify_arch and -info for example. So ZeroOrMore is correct, and the llvm-lipo command driver is going to have to enforce this exclusivity.

95 ↗(On Diff #199542)

Do you mean "O->getArch()"? I agree, if so.

96 ↗(On Diff #199542)

the lipo -verify_arch command does not print an error if the command is able to run successfully, as it's primarily for using with shell scripts and Makefiles. Is a verbose message really necessary here? If less, please implement a -non_verbose option.

mtrent added inline comments.May 15 2019, 4:04 PM
tools/llvm-lipo/llvm-lipo.cpp
96 ↗(On Diff #199542)

Sigh. s/If less/If yes/;

Address comments, have not added docs in this commit yet, can do as a follow-up or will update this diff a bit later.

regarding -verify_arch X Y Z - yeah, I can try to make it work with LLVM's command line options parser, not sure if this interface is ideal / better than -verify_arch X -verify_arch Y -verify_arch Z, but yeah, on the other hand, this would be an incompatibility / would require customers of the old tool to update their scripts/code. The downside of the old interface - the positional argument should go before -verify_arch. To be honest at the moment I don't see how to express the old behavior of -verify_arch using libCommandLine, any suggestions would be helpful, alternatively we can try to live with this incompatibility.

In D61927#1503871, @alexshap wrote:

regarding -verify_arch X Y Z - yeah, I can try to make it work with LLVM's command line options parser, not sure if this interface is ideal / better than -verify_arch X -verify_arch Y -verify_arch Z, but yeah, on the other hand, this would be an incompatibility / would require customers of the old tool to update their scripts/code. The downside of the old interface - the positional argument should go before -verify_arch. To be honest at the moment I don't see how to express the old behavior of -verify_arch using libCommandLine, any suggestions would be helpful, alternatively we can try to live with this incompatibility.

Well, let's have a look at Apple's command syntax. I re-wrote the man page for the latest release, so it's a little clearer to see what's going on. I'll summarize here.

These commands/options take no arguments and are no trouble at all for llvm's command-line parser: -archs, -detailed_info, -info, -hideARM64

These commands/options take a single argument, and only one of these options is allowed. Again, easy mode: -thin, -output.

These commands take a single argument, and any number of them can appear (as long as this is the only such command that appears). Easy mode, you have to handle the command exclusivity yourself in your command driver anyway, so that's not really any extra work: -extract, -extract_family, -remove,

These commands / options take a fixed number of arguments, where n > 1: -create, -replace, -arch, -segalign. I'm under the impression this is something llvm does not support of the box, because we see this problem wtih "llvm-nm -s". But it's possible that the "AdditionalVals" parameter on class Option is meant to deal with this situation. But, if so, no one has circled back to llvm-nm. Some of these commands are pretty critical.

And this command is the worst of the bunch, taking any number of arguments. It's also the one you implemented first :): -verify_arch.

Looking ahead to your task of making a drop-in replacement for Apple's lipo, you're going to have to tackle -verify_arch, -create, -replace, -arch, and -segalign. That's a non-trivial surface area of the lipo tool. My advice is to see if you can model the complete set of lipo command-line options you want to reverse-engineer -- don't worry about implementing the feature, just parse the arguments. If -verify_arch is the only one that doesn't fit the mold, I think "just live with it" is the right call. But if you can't get -create to work with an arbitrary number of -arch or -arch_blank options, or if you can't get -replace working, you're going to have trouble making a viable "drop-in-replacement."

HTH!

@mtrent, good call, will have a look and update this diff

alexander-shaposhnikov planned changes to this revision.May 15 2019, 10:01 PM
This comment was removed by alexander-shaposhnikov.
seiya added a subscriber: seiya.May 15 2019, 10:05 PM

Use a bigger hammer to parse command line options

alexander-shaposhnikov changed the visibility from "All Users" to "Public (No Login Required)".May 17 2019, 3:59 PM
mtrent accepted this revision.May 23 2019, 5:49 PM

That's a pretty big hammer. Let's give it a try!

This revision is now accepted and ready to land.May 23 2019, 5:49 PM
This revision was automatically updated to reflect the committed changes.