This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy][MachO] Implement --only-section
ClosedPublic

Authored by seiya on Jul 31 2019, 2:39 PM.

Event Timeline

seiya created this revision.Jul 31 2019, 2:39 PM
llvm/tools/llvm-objcopy/CopyConfig.h
102

Name.conains(',')

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
28

return !is_contained(Config.OnlySection, Sec.CanonicalName)

llvm/test/tools/llvm-objcopy/MachO/only-section.test
3

0. I think we need to make this test more robust and verify that the content of those sections is correct as well as the load commands, etc
we also need tests:

  1. where the specified arguments are invalid (to test error messages)
  2. what happens when the specified section name is not present in the input binary
  3. where all the sections should stay in place (i.e. --only-section lists all the available sections) -> the output binary should be "the same" (unless there is some ambiguity, i.e. the values of the padding bytes might be different, but I'd add a small test (as small as possible))
jdoerfert resigned from this revision.Aug 1 2019, 9:58 AM
seiya updated this revision to Diff 212979.Aug 2 2019, 12:25 AM
seiya marked 2 inline comments as done.
  • Addressed some review comments.
seiya marked an inline comment as done.Aug 2 2019, 12:31 AM
seiya added inline comments.
llvm/test/tools/llvm-objcopy/MachO/only-section.test
3

0. I think we need to make this test more robust and verify that the content of those sections is correct as well as the load commands, etc

AFAIK, yaml2macho.cpp fills sections with dummy data (0xDEADBEEFu) so I'll write another patch to allow specifying arbitrary binary data in a MachO YAML.

seiya updated this revision to Diff 213736.Aug 6 2019, 3:37 PM
  • Addressed a review comment.
seiya updated this revision to Diff 213740.Aug 6 2019, 3:42 PM

Re-upload the diff: the previous update was mistake.

seiya updated this revision to Diff 215341.Aug 15 2019, 1:34 AM
  • Improved the test cases.
seiya marked an inline comment as done.Aug 15 2019, 1:35 AM
seiya updated this revision to Diff 215342.Aug 15 2019, 1:37 AM
  • Added some newlines for readability.
Harbormaster completed remote builds in B36812: Diff 215342.

This goes for all the new functionality you are adding, but make sure to review the llvm-strip and llvm-objcopy documentation (llvm/docs/CommandGuide) for any changes that need to be made. For example, if any of the switches are currently listed as "ELF specific" they need to be moved into the general switch section.

rupprecht marked an inline comment as done.Aug 15 2019, 12:30 PM
rupprecht added inline comments.
llvm/tools/llvm-objcopy/CopyConfig.cpp
419

These errors should also include the full Name string so the user knows specifically which flag is the culprit.

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
35–46

Aside: we still have a lot of ELF-specific stuff in CopyConfig, which should theoretically be relatively platform-agnostic -- for example, the --add-symbol parsing code creates functors that assign the symbol binding to ELF::STB_* values. We should think about how to properly separate ELF/MachO/COFF specific things there.

llvm/tools/llvm-objcopy/MachO/Object.cpp
15–17

SymTable/StrTable are explicitly referenced in MachO/Object.h, so if either of those are matched by the predicate, those tables should be cleared out too. (And there should be test cases to verify this).

Also, you might want to use stable_sort to remove sections -- see ELF's implementation.

Although specific to --only-section (see replaceAndRemoveSections, ELFObjcopy.cpp), StrTable/SymTab should not be implicitly removed, so it's not really observable with just this patch.

seiya updated this revision to Diff 215517.Aug 15 2019, 6:46 PM
seiya marked 3 inline comments as done.
  • Updated CommandGuide.
  • Addressed a review comment.
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
35–46

I think so too. What complicates the problem is that we don't know the file format until we parse the command line options and decode the input file. How about splitting the command-line parsing into two stages?

CopyConfig.h

struct UnparsedNewSymbolInfo {
  StringRef Bind; // Don't parse it for now.
  // ...
};

class CopyConfig {
  // Format-independent options.
  StringRef InputFilename;
  FileFormat InputFormat;
  MachineInfo BinaryArch;

  // ... 

  // Format-specific options.
  UnparsedNewSymbolInfo NewSymbolInfo;
};
ELFCopyConfig.h

struct NewSymbolInfo {
  uint8_t Type = ELF::STB_GLOBAL;
  // ...
};

class ELFCopyConfig : public CopyConfig {
  ELFCopyConfig(CopyConfig &Config) {
    // We should cache the result in case there're many input files to process.
    parse(Config);
  }
};
ELFObjcopy.cpp
Error executeObjcopyOnBinary(const CopyConfig &UnparsedConfig, ...) {
  ELFCopyConfig Config(UnparsedConfig);
  // ...
}

In the first stage (CopyConfig), we parse only format-independent options. In second stage (ELFCopyConfig), we parse format-specific options (the Bind field, in this example).

llvm/tools/llvm-objcopy/MachO/Object.cpp
15–17

SymTable/StrTable are explicitly referenced in MachO/Object.h, so if either of those are matched by the predicate, those tables should be cleared out too. (And there should be test cases to verify this).

Unlike ELF, in MachO, SymTable/StrTable are not sections, but are load commands. Thus, we never remove them by this removeSections.

Also, you might want to use stable_sort to remove sections -- see ELF's implementation.

Perhaps you meant stable_partition?

seiya updated this revision to Diff 215810.Aug 18 2019, 9:12 PM

Refactored a bit.

jhenderson added inline comments.Aug 19 2019, 3:00 AM
llvm/test/tools/llvm-objcopy/MachO/only-section.test
7

Error case 1? Where are error case 2, 3...?

seiya updated this revision to Diff 215988.Aug 19 2019, 2:42 PM
seiya marked 2 inline comments as done.

Fix a review comment.

llvm/test/tools/llvm-objcopy/MachO/only-section.test
7

It's certainly confusing. Removed 1.

jhenderson added inline comments.Aug 20 2019, 2:08 AM
llvm/test/tools/llvm-objcopy/MachO/only-section.test
16

"Specify __TEXT,__text" -> "Specify one section."
"only the" -> "only that"

59–60

You say "without warning" here, but don't actually check for a warning, which you probably should do, if you're going to mention it in the comment.

seiya updated this revision to Diff 216127.Aug 20 2019, 6:23 AM

Updated CommandGuide.

rupprecht marked 2 inline comments as done.Aug 21 2019, 1:08 PM

LGTM, once a MachO reviewer signs off. No action needed for my two comments.

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
35–46

Aside: woah, didn't realize phab could display cool tabbed codeblocks like that.

Yes, that's a decent sketch for how it might work. CopyConfig itself should be returning uninterpreted StringRefs instead of ELF/MachO-specific things.

llvm/tools/llvm-objcopy/MachO/Object.cpp
15–17

Yeah, stable_partition, good catch.

In retrospect, I think the use of ELF/removeSections actually may not need to use stable_partition, and switching to remove_if may be faster. IIUC the difference is that stable_partition guarantees the things past the partition point are also in stable order, whereas remove_if does not (and is faster for not having to). So disregard my original suggestion.

seiya marked an inline comment as done.Aug 22 2019, 10:14 PM
seiya added inline comments.
llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
35–46

OK, I'll propose it in a separate patch.

(I've been wanting to try this feature. It's neat.)

seiya marked an inline comment as done.Aug 22 2019, 10:50 PM
seiya added inline comments.
llvm/tools/llvm-objcopy/MachO/Object.cpp
15–17

As you say, it seems that we should use remove_if instead. According to this blog post (lacks the details of the experiment environment though), remove_if is faster than stable_partiton. I think normally the number of sections are small, so using remove_if improve the performance not so much though.

seiya planned changes to this revision.Aug 26 2019, 1:36 AM

I'll submit another patch to refactor CopyConfig as described in https://reviews.llvm.org/D65541#inline-594247

seiya updated this revision to Diff 222342.Sep 29 2019, 8:11 PM

Add MachOConfig.cpp to isolate command-line validation logic from MachOObjCopy.cpp.

jhenderson added inline comments.Sep 30 2019, 5:08 AM
llvm/tools/llvm-objcopy/MachO/MachOConfig.cpp
22–23 ↗(On Diff #222342)

I know this was there before, but I'm not sure I understand this TODO. I think it's saying that support should be added to GNU objcopy for something? If so, I don't think you want this TODO in the code. TODOs in LLVM code should be about LLVM changes that should be made at some point.

llvm/tools/llvm-objcopy/MachO/MachOConfig.h
12–13 ↗(On Diff #222342)

I don't think you need these two includes here.

22–23 ↗(On Diff #222342)

Is this clang-formatted? I thought it should be struct MachOCopyConfig {}; but could be wrong easily.

It might be worth a simple comment saying why there's an empty struct. I initially thought it was an incorrect attempt to forward declare it!

seiya updated this revision to Diff 222539.Sep 30 2019, 8:13 PM
seiya marked 5 inline comments as done.

Addressed review comments.

seiya marked 4 inline comments as done.Sep 30 2019, 8:17 PM
seiya added inline comments.
llvm/test/tools/llvm-objcopy/MachO/only-section.test
59–60

Removed "without warning" from the comment since the behavior is not so surprising.

llvm/tools/llvm-objcopy/MachO/MachOConfig.cpp
22–23 ↗(On Diff #222342)

Now I noticed that comment does not make sense. This TODO is for LLVM code, not GNU objcopy. Rephrased the comment.

llvm/tools/llvm-objcopy/MachO/MachOConfig.h
22–23 ↗(On Diff #222342)

It might be worth a simple comment saying why there's an empty struct. I initially thought it was an incorrect attempt to forward declare it!

Good point! Added a comment.

seiya marked 2 inline comments as done.Sep 30 2019, 8:18 PM
jhenderson accepted this revision.Oct 1 2019, 1:42 AM

I don't know about any Mach-O specific details, but everything else looks good to me. Best get somebody else with Mach-O expertise to confirm.

llvm/tools/llvm-objcopy/MachO/Object.h
42

CannonicalName -> CanonicalName

This revision is now accepted and ready to land.Oct 1 2019, 1:42 AM
seiya updated this revision to Diff 222609.Oct 1 2019, 7:26 AM

Addressed a review comment.

seiya marked an inline comment as done.Oct 1 2019, 7:26 AM
seiya added a comment.Oct 7 2019, 8:31 PM

Friendly ping: I'd like someone who is familiar with Mach-O to review this patch.

seiya added a comment.Oct 16 2019, 1:44 AM

@alexshap Could you review this patch? It's accepted but I'd like someone who's familiar with Mach-O to confirm (a ping in case you missed changes in this patch).

alexander-shaposhnikov added inline comments.
llvm/tools/llvm-objcopy/CopyConfig.cpp
429

khm, maybe there is a named constant for this ? (somewhere in llvm::MachO:: )

@seiya - thanks for all your work, I'm wondering - can you commit this change / is there anything blocking this diff ?

seiya marked 2 inline comments as done.Oct 16 2019, 6:55 PM

Thank you for your review!

In D65541#1711938, @alexshap wrote:

@seiya - thanks for all your work, I'm wondering - can you commit this change / is there anything blocking this diff ?

There're no diffs that blocks this patch. I'll commit this patch along with D66280 and D66281.

Other Mach-O support patches for llvm-objcopy needs addressing some review comments. I'll let you know once I update them.

llvm/tools/llvm-objcopy/CopyConfig.cpp
429

I've git-grep'ed through llvm::MachO and lld but there were no constant for it (lld hard-codes the value).

seiya updated this revision to Diff 225551.EditedOct 17 2019, 6:02 PM
seiya marked an inline comment as done.

Before committing this patch, I noticed that we need to consider how llvm-objcopy should handle wildcards in Mach-O support.

First I thought that validating the segment/section names would be helpful for users, for example:

$ llvm-objcopy llvm-objcopy --only-section __text foo
error: invalid section name '__text' (should be formatted as '<segment name>,<section name>')

However, the validation (check that the option value is formatted as <segment name>,<section name>) does not make sense when taking into account wildcards. In GNU objcopy, it seems it does not care about the segment/section name separator (.) at all. In the following example, GNU objcopy accepts both __TEXT* and __TEXT.* and outputs identical outputs.

$ gobjcopy llvm-objcopy --only-section '__TEXT*' foo foo2
$ gobjcopy llvm-objcopy --only-section '__TEXT.*' foo foo3
$ cmp foo2 foo3
$ echo $?
0

In this change, I've removed the validation to support the GNU objcopy's wildcard behavior for Mach-O objects as described above and added a test case for wildcard.

seiya added a comment.Oct 25 2019, 2:39 AM

Ping @alexshap (and also @jhenderson and @rupprecht would want to confirm before landing). After this patch got accepted, in order to merge it, I added wildcard support and a test in this diff.

seiya updated this revision to Diff 226404.Oct 25 2019, 3:53 AM

Updated a test.

rupprecht accepted this revision.Oct 25 2019, 2:19 PM

LGTM with the wildcard changes, thanks!

This revision was automatically updated to reflect the committed changes.