Page MenuHomePhabricator

[llvm-objcopy][MachO] Add support for removing Swift symbols
ClosedPublic

Authored by alexshap on May 17 2020, 9:19 PM.

Details

Summary

cctools' strip has the option "-T" which removes the symbols whose names begin with _$S' or _$s' only if it finds an __objc_image_info section containing a non-zero swift version (and the binary is either an executable or a dynamic library).
This diff implements this option in llvm-strip for MachO.

Test plan: make check-all

Diff Detail

Event Timeline

alexshap created this revision.May 17 2020, 9:19 PM
Herald added a project: Restricted Project. · View Herald Transcript
alexshap edited the summary of this revision. (Show Details)May 17 2020, 10:33 PM
alexshap updated this revision to Diff 264536.May 17 2020, 10:39 PM

Add early return

alexshap updated this revision to Diff 264549.May 18 2020, 12:12 AM

Apply clang-format

The logic looks good and matches up with cctools strip.

The man page does say:

-T     The intent of this flag is to remove Swift symbols.  It removes the symbols whose names begin with `_$S' or `_$s' only when it finds an __objc_imageinfo section with and it has a non-zero swift version.  The future the implementation of this  flag
       may change to match the intent.

so we should keep an eye out for future behavioral changes. Also CC @mtrent if he has any input on that.

llvm/test/tools/llvm-objcopy/MachO/remove-swift-symbols.test
2

Can you also add a test for Swift symbols *not* being stripped when a Swift version isn't present? It would also be great to test _$S symbols in addition to _$s ones.

llvm/tools/llvm-objcopy/COFF/COFFObjcopy.cpp
255

This is okay for now, but note that Swift supports both COFF and ELF, so theoretically this flag could apply there as well. I don't know what the equivalent symbols would look like on COFF or ELF though ... @compnerd?

llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
607–609

Same coment here.

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
77

& has stupidly low precedence in C/C++ (because && didn't exist once upon a time), so this is gonna parse as

(Obj.Header.Flags & (MachO::MH_DYLDLINK == MachO::MH_DYLDLINK))

You'll need the parens around (Obj.Header.Flags & MachO::MH_DYLDLINK) instead.

Moreover, since MH_DYLDLINK is a single bit, can't you just do

(Obj.Header.Flags & MachO::MH_DYLDLINK)

or, if you wanna be really explicit

(Obj.Header.Flags & MachO::MH_DYLDLINK) != 0

?

alexshap marked an inline comment as done.May 19 2020, 1:17 PM
alexshap added inline comments.
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
77

good catch, thanks!

alexshap updated this revision to Diff 265056.May 19 2020, 3:03 PM

Address comments

alexshap marked 2 inline comments as done.May 19 2020, 3:06 PM
smeenai accepted this revision.May 19 2020, 3:30 PM

LGTM

This revision is now accepted and ready to land.May 19 2020, 3:30 PM
alexshap updated this revision to Diff 265102.May 19 2020, 5:40 PM

Apply clang-format

Please don't forget to update the CommandGuide doc for llvm-strip. Is this option deliberately not an llvm-objcopy option?

llvm/test/tools/llvm-objcopy/MachO/remove-swift-symbols.test
7

Nit: it's fairly typical to ad addditional spaces to makes the first line here line up with the rest of the output:

# NO-SWIFT-SYMBOLS:      Symbols [
# NO-SWIFT-SYMBOLS-NEXT:   Symbol {

Same applies below.

136

Probably change "since" -> "when"

llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
608

I think there's a std::errc::not_supported you could use here.

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
76–77

Seems like you need test cases here for a) MH_DYLDLINK not being set and b) SwiftVersion being 0.

I personally would find it easier to read if the version check were explicitly compared against 0: *Obj.SwitftVersion != 0.

llvm/tools/llvm-objcopy/MachO/MachOReader.cpp
294–297

Seems like most of these clauses in this if are untested? I'd expect to see various test inputs with an __objc_imageinfo in each of these segments, plus one with it being in a different segment and one that is too small.

299

This also seems to be untested.

alexshap marked an inline comment as done.May 20 2020, 2:35 AM
alexshap added inline comments.
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
76–77

@jhenderson :

  1. If you look at the man pages for cctools' strip then you will see that there are remarks there indicating that this behavior can change in the future. Regarding tests (what you mentioned in the comment above and below) - I didn't do that intentionally since it looks like they will have to be either removed or adjusted in the future, the current implementation is consistent with the latest version of cctools.

If you insist on covering those (very similar) cases - we can add them, though I think right now it seems to be ahead of time / not very useful.

  1. I'm not aware of any official objcopy (from Apple) which would work for MachO (in particular, cctools contains strip and other things, but not objcopy), so right now it looks like there is no need to introduce such flag to llvm-objcopy. If somebody requests it - sure, we can consider adding it. However, since we are trying to make llvm-strip a drop-in replacement for strip (in this particular case - I'm referring to the tool 'strip' from Xcode (cctools)) - we have to implement this option.
  1. regarding the comment about big-endian - if I'm not mistaken the PowerPC backend (for Apple platforms) was removed even from Clang a few years ago, so at the moment it's a bit unrealistic (if I'm not mistaken) that we'll have a big-endian MachO executable or dynamic library built from Swift sources. However, since people can use LLVM on a big-endian host machine (and if I am not mistaken we have such buildbots) - we need to take care of those conversions. If there is an issue those big-endian buildbots are supposed to catch it.
alexshap marked an inline comment as done.May 20 2020, 2:44 AM
alexshap added inline comments.
llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
608

I think it's important to be consistent in what we return in such cases,
I've looked at the COFF / MachO versions - they both return invalid_argument.
So my point is that if we want to change it then it should be discussed and done separately.

alexshap marked an inline comment as done.May 20 2020, 7:21 AM
alexshap added inline comments.
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
76–77

Looks like I have an idea how to organize these tests in a better way, I'll update this diff.

alexshap updated this revision to Diff 265258.May 20 2020, 8:19 AM

Address @jhenderson's comments

Thanks @alexshap. The only thing outstanding from my point of view, aside from a nit, is documenting the switch. I think we can keep it concise in llvm-strip's documentation by just saying the same thing as the help text. That way, if details change, it's not a problem. Users probably don't need to know the details. BTW, I couldn't quickly find a cctools strip man page online, so I can't comment on their documentation.

llvm/test/tools/llvm-objcopy/MachO/remove-swift-symbols.test
2

Here and below: swift -> Swift

alexshap marked an inline comment as done.May 21 2020, 9:30 AM
alexshap added inline comments.
llvm/test/tools/llvm-objcopy/MachO/remove-swift-symbols.test
2

yeah )

alexshap updated this revision to Diff 265617.May 21 2020, 3:02 PM

Update CommandGuide/llvm-strip.rst

This revision was automatically updated to reflect the committed changes.

Isn't this potentially problematic from a compatibility point-of-view, since it squats a single-letter command-line argument for a mach-o-specific feature, while llvm-strip is generally trying to be compatible to GNU strip?

Isn't this potentially problematic from a compatibility point-of-view, since it squats a single-letter command-line argument for a mach-o-specific feature, while llvm-strip is generally trying to be compatible to GNU strip?

I think it may be problematic if GNU strip reserves -T for other purposes. Can we have a --strip-* long option instead? --strip_swift_symbols (the obvious option name inferred from the TableGen variable) may not be most suitable because from the description "_$S' or _$s' only if it finds an __objc_image_info section containin" this option only strips some special cased symbols, not all Swift symbols.

alexshap added a comment.EditedMay 26 2020, 10:13 PM

Thank you for the question!

First, I checked in advance the documentation https://sourceware.org/binutils/docs/binutils/strip.html - there is no conflict for this option, if one day this issue comes up, or, more generally, if one day we have to parse command-line options differently for different formats (in this case it might be MachO or COFF or ELF) then the approach used in llvm-objcopy/llvm-strip will have to be adjusted anyway. It looks like at the moment there is no urgency to rename this option (+ it would introduce some friction for users on OSX, they won't be able to seamlessly switch from cctools strip to llvm-strip, especially for large complex projects). Additionally, If these issues arise, I'm not sure renaming is really the right solution, although this question requires more thinking (i'm not ready to answer it right away). One can imagine several other options how to resolve this hypothetical issue without breaking the downstream users (or at least with much smaller disruption): either introduce a "new driver", or (similarly to llvm-objdump) introduce "--macho", there are other approaches as well (it's not the full list).

Second, to the best of my knowledge the support for MachO in binutils is fragmented (e.g. strip doesn't properly support MachO if I'm not mistaken) and even small bits which exist are outdated (e.g. after a very brief look I don't see any mentions of the load commands which were introduced in the iOS era).

For Apple platforms (iOS, OSX) it's common to use tools provided by Xcode, and (again, to the best of my knowledge) several important utilities (distributed by Xcode) (install-name-tool, strip, lipo, bitcode-strip) actually come from cctools.
So to use llvm-strip in place of cctools' strip we need functional parity (at least to some extent). Moreover, having support for these features is also useful in some other cases. For example, in the past I used to work on Swift,
there are developers hacking on Swift which are using non-Apple platforms as their primary development environments, having such tools unblocks a number of use-cases
(the most trivial one - running tests for MachO binaries (which don't require the actual runtime, e.g. various tools which parse / extract some information from object files) on non-OSX host platforms).

I'm okay with using the short option (although a long option wouldn't hurt too FWIW). I think it's useful to have compatibility with as many strip tools as we reasonably can. Whilst GNU compatibility is obviously our primary concern, where GNU doesn't have certain options but other versions of strip do, I think we should emulate them. The benefits have already been outlined (less friction in transitioning etc). A good example is the --strip-sections option which I believe came from eu-strip. I would hope that GNU might consider what options other tools already provide when choosing new options, so I would hope that risk of future compatibility is reduced. When there is no precedent for a short option in other strip/objcopy tools however, I think it is highly desirable to not add a short option alias ourselves, unless the meaning is obvious (e.g. in GNU dumping tools, -C always means --demangle, so if we have --demangle, having -C is fine even though GNU don't currently).

We already _aren't_ and _cannot_ be compatible with cctools strip's command-line, because cctools strip already has a bunch of one-letter option choices which actively conflict with existing options in llvm-strip (and GNU strip from which we copied them). There's at least -s -R -d -N with conflicting meanings.

Given that, I do not think adding "-T" to llvm-strip is good idea. While it may not conflict today, it might tomorrow. With command-line compatibility to cctools out of the question, it then comes down to a question of naming -- and "-T" is just a bad choice for a special-purpose option like this. It ought to have been given a descriptive option name in the first place.

I do think we should just rename this to --strip-swift-symbols for llvm-strip. If, in the future, someone wants to make another command which is a drop-in-replacement for cctools strip, that could be done, and it will need to do option-translation.

alexshap added a comment.EditedMay 27 2020, 10:40 AM

@jyknight, thank you for the comment.

  1. Just to be clear - introducing new names alone will make it incompatible with a huge variety of existing projects / their build systems. If hypothetically you move this burden to the layer of a build system then users will have to deal with this issue N times, where N is equal to the number of users or to the number of different build systems (depending on how you count). It would create an unnecessary obstacle and would cause massive duplication of efforts.

P.S. llvm-objcopy indeed has reimplemented a number of command-line options of binutils objcopy,
but it also has some extensions (flags which binutils objcopy doesn't support) as well as some omissions (flags which binutils objcopy supports but llvm-objcopy doesn't) so in this respect it doesn't create a new precedent.
It's also true for other binary-level tools in LLVM which emulate (to some extent) the interface of their counterparts from other projects.

  1. The current state of llvm-strip - some projects build successfully if one switches to llvm-strip, though certainly these tools are still under development (and this statement is not specific to MachO) and different features are implemented when they are needed. I'm aware of several ways how people use llvm-objcopy / llvm-strip for MachO.

That said, your concern is valid, indeed, in the future the issue which you mentioned may come up (though there is no conflict now, moreover, introducing a new name would break the existing use-cases).
Given that bintuils strip for MachO is not used (and it's not really fully functional if I am not mistaken) - the most common (and sort of the most realistic) way to use llvm-strip for MachO is to use it in place of the "system strip" (coming cctools) (on OSX).

Taking into account what was mentioned in (1) renaming (or renaming alone) doesn't appear to the right solution.

As soon as we run into this issue - no doubt, a solution will need to be implemented. The current state doesn't appear to block it or prevent it in any way, moreover, it enables us to move step by step
and there are multiple ways to solve this problem.
"If, in the future, someone wants to make another command which is a drop-in-replacement for cctools strip, that could be done, and it will need to do option-translation" - this is essentially one of the options mentioned above, though the details require more thinking.
Let me point out that llvm-strip is also just another driver for llvm-objcopy (it was introduced here: https://reviews.llvm.org/rL331663),
all what it does - it translates options and invokes llvm-objcopy).

MaskRay added a comment.EditedMay 27 2020, 11:19 AM

Reading https://opensource.apple.com/source/cctools/cctools-949.0.1/man/strip.1.auto.html , I confirm that -S -r -D -n are conflicting.

@jyknight, thank you for the comment.

  1. Just to be clear - introducing new names alone will make it incompatible with a huge variety of existing projects / their build systems. If hypothetically you move this burden to the layer of a build system then users will have to deal with this issue N times, where N is equal to the number of users or to the number of different build systems (depending on how you count). It would create an unnecessary obstacle and would cause massive duplication of efforts.

How can llvm-strip be plugged into an existing project if -S -r -D -n (supposedly common usage) do not work?

P.S. llvm-objcopy indeed has reimplemented a number of command-line options of binutils objcopy,
but it also has some extensions (flags which binutils objcopy doesn't support) as well as some omissions (flags which binutils objcopy supports but llvm-objcopy doesn't) so in this respect it doesn't create a new precedent.
It's also true for other binary-level tools in LLVM which emulate (to some extent) the interface of their counterparts from other projects.

FWIW For many such extensions I have created feature requests on binutils bugzilla or even implemented the features by myself.
If we are going to add a new long option, I hope we can see efforts asking cctools to adopt. Some option names may potentially be reused with different semantics. If they don't want to implement those features, we can at least get a promise that they will not intentionally use the options names for difference purposes.

For this particular case, -T is a really unfortunate name for a special cased feature. I think we need to persuade them to add a long option.

  1. The current state of llvm-strip - some projects build successfully if one switches to llvm-strip, though certainly these tools are still under development (and this statement is not specific to MachO) and different features are implemented when they are needed. I'm aware of several ways how people use llvm-objcopy / llvm-strip for MachO.

That said, your concern is valid, indeed, in the future the issue which you mentioned may come up (though there is no conflict now, moreover, introducing a new name would break the existing use-cases).
Given that bintuils strip for MachO is not used (and it's not really fully functional if I am not mistaken) - the most common (and sort of the most realistic) way to use llvm-strip for MachO is to use it in place of the "system strip" (coming cctools) (on OSX).

Given the -S -r -D -n incompatibility, I think we will run into trouble soon.

@MaskRay - if i'm not mistaken for dynamic libraries the maximum level of stripping is usually -x (or -x -T), also for executables it's frequently used without any options at all (e.g. strip main.exe).
Also note that we reached a functional state for some non-trivial transformations (in llvm-objcopy) relatively recently (~weeks ago from now) .
Yeah, given the discrepancies it looks like we'll have to solve this problem soon.

The current state doesn't appear to block it or prevent it in any way

It does prevent it. This change introduces an argument, -T to the option syntax of the llvm-strip tool, which we'd be better off without. But, once we add it, users of llvm-strip will depend on it, and then we can't easily remove it later on. The only argument for the name -T is compatibility with cctools strip, which is unachievable within a single driver syntax, anyways.

Now, if it wasn't for all the other incompatibilities with cctools' strip, I could buy the argument that it's worth adding -T, despite that it was a bad choice of name, and despite the potential for future conflicts. But, since compatibility is impossible, we have those downsides, and little upside to go along with it.

introducing a new name would break the existing use-cases

I don't see how, since llvm-strip isn't a drop in replacement for cctools strip anyhow.

The current state doesn't appear to block it or prevent it in any way

It does prevent it. This change introduces an argument, -T to the option syntax of the llvm-strip tool, which we'd be better off without. But, once we add it, users of llvm-strip will depend on it, and then we can't easily remove it later on. The only argument for the name -T is compatibility with cctools strip, which is unachievable within a single driver syntax, anyways.

@jyknight - alright, I wouldn't like to repeat the arguments already presented and want to put this discussion into a more constructive course to avoid moving back and forth.

"It does not" for the following reasons. First, it's supported only for MachO (and by the way there are options supported only for ELF), so the only way users can depend on it is a legitimate one (the system strip supports this flag and various projects use it). Second, from users' perspective the fact that "strip" is only a driver for another tool (whether it's the existing driver, the existing driver with a new "flavor" or a new driver) is a detail of implementation (and it is hidden from them). So if e.g. we introduce a new way to invoke llvm-strip (llvm-objcopy) it should not affect them / disrupt their workflow, thus from users' perspective it doesn't fundamentally alter this change.
Also, please, note that the argument regarding binutils for MachO is invalid (see comments above).

Now, if it wasn't for all the other incompatibilities with cctools' strip, I could buy the argument that it's worth adding -T, despite that it was a bad choice of name, and despite the potential for future conflicts. But, since compatibility is impossible, we have those downsides, and little upside to go along with it.

introducing a new name would break the existing use-cases

I don't see how, since llvm-strip isn't a drop in replacement for cctools strip anyhow.

Let me remind (and it has already been mentioned above) it's not fully functional yet but it's moving forward, and there are projects which build with it, it's just one of the first steps,
introducing a new name would break it without providing much benefits at all.
Basically llvm-strip would exist on OSX but would be unusable and would make it harder even to test it / plug into a real world project and see what features are missing and what needs to be fixed.
Based on the experience with ELF I'm pretty sure it's a long path and we will encounter and fix issues along the way, and llvm-objcopy (llvm-strip) for ELF didn't magically become compatible with binutils immediately.

That said, the problem with the discrepancies will have to be addressed at some point, nobody disagrees with it and, in particular, I'm open to discuss the approaches, should it be a new driver, a new "flavor" or something else.

I don't have a good solution to the conflict here, but I want to summarise some of the options we have, with a brief summary of the pros and cons of each. This is duplicating some of what is above, but will hopefully help frame my own and possibly others thinking:

  1. Add a long option instead of a short option. Pros: simple, shouldn't be incompatible with a future GNU option abbreviation. Cons: Incompatible with cctools strip, making it impossible to just drop it in as a replacement for that tool for usages that otherwise would work (i.e. which don't use other incompatible options).
  2. Add a long option and a short option. Pros: provides a migration path for users who want to adopt llvm-strip instead of cctools trip; provides a possible future alternative in the event that GNU use -T for something else. We'd probably want to remove our alias, which would break some people's builds, but at least they'd have an alternative without changing away from using llvm-strip. Cons: people might start using -T when they should use the long option. This could be somewhat mitigated with documentation/warnings etc, but perhaps this is suboptimal. Also doesn't allow people to smoothly migrate if we have to remove the short option in the future. Also we'd have to have some sort of delay between removing the short option and reinstating it with the new meaning to prevent a silent behaviour change. This throws up questions like "how long a delay", so probably isn't ideal.
  3. Have a different cctools-compatible front-end for llvm-strip. Pros: allows users of cctools strip to migrate. Cons: requires more maintenance (more tablegen options etc to maintain); wouldn't work as a system linker using the current determined-by-filename approach (currently the file name drives whether the tool is llvm-objcopy or llvm-strip and we already say "if name contains strip, treat as llvm-strip", so can't overload it with "treat as cctools strip").
  4. Have a different option (e.g. --macho) that controls the behaviour of option parsing. Pros: allows us to treat options differently depending on the presence of the switch. Cons: doesn't allow for immediate drop-in replacement.
  5. New idea: change behaviour based on host OS, similar to llvm-ar behaviour and llvm-cxxfilt --strip-underscore. Pros: allows drop-in replacement on Mach-O based systems; has some precendent in the other tools. Cons: doesn't allow for cross testing easily (i.e. would require something like --macho too); could be a little confusing to future users.

I just came up with it, but I like 5). I think it would satisfy most, possibly all of our requirements, has existing precedent in our other tools, and poses a low risk of future incompatibility. What do others think?