Page MenuHomePhabricator

[llvm-objcopy] First bits for MachO
ClosedPublic

Authored by alexshap on Nov 17 2018, 6:53 PM.

Details

Summary

This diff adds the ability to copy without modification MachO object files.
The code will obviously evolve, significantly change and extend in the future, so will do the tests,
this is just the beginning to start the ball rolling.

Test plan: make check-all

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
alexshap changed the visibility from "All Users" to "Public (No Login Required)".Nov 17 2018, 6:59 PM
alexshap updated this revision to Diff 174525.Nov 17 2018, 7:08 PM

typo fixed

Higuoxing added inline comments.Nov 17 2018, 10:28 PM
tools/llvm-objcopy/MachO/Object.h
121 ↗(On Diff #174525)

Here's a small typo: initialized

I don't know anything about the MachO file format really, so I'm not sure I'm well qualified to review the file format side of this. I've made a couple of drive-by nitpick comments though, and may come back later with more substantial comments.

One general review-related comment, is this the minimal required to get a trivial MachO file copied? If it isn't, it might be nice to ignore the unnecessary components for this first commit. Of course, this assumes that we are thinking ahead to dependency issues, so that we don't end up with the issues we had with ELF support.

One more general comment I have, that doesn't really affect this change, but will impact later things, is that I suspect adding MachO support will require lots of new switches. If these switches aren't applicable to ELF, I'd like us to consider how we can sanitize the help text, so that ELF-only options and MachO-only options are not visible to users of the other version. I'm not sure how to achieve that necessarily though. In LLD and llvm-readobj, the executable name has an effect on option support. We already do something similar with llvm-strip versus llvm-objcopy too. One possibility might be to do the same here (e.g. llvm-elfcopy for ELF switches, llvm-machocopy for MachO switches, and llvm-objcopy for all). Alternatively, if that's not really viable, or possibly in addition to this, I'd support having some easy way of disabling (or at least hiding) all options related to a specific file format, so that downstream ports don't have to worry about file formats that they don't care about.

test/tools/llvm-objcopy/MachO/basic-copy.test
1 ↗(On Diff #174525)

May be worth naming this file basic-little-endian-copy.test.

I take it MachO doesn't have 32/64 bit variations?

tools/llvm-objcopy/MachO/Object.h
93 ↗(On Diff #174525)

My reading of the LLVM coding standards suggests that we should only be using C++ style comments here, rather than C-style.

tools/llvm-objcopy/llvm-objcopy.cpp
27 ↗(On Diff #174525)

Spurious blank line?

alexshap updated this revision to Diff 174656.Nov 19 2018, 11:37 AM

Address comments, improve the tests, fix some typos.

@jhenderson, yeah, this is the minimal amount of code i was able to come up with for coping over MachO object files without modifications, so essentially the only useful (hopefully) things here are serialization/deserialization. The "intermediate" model is a stub and will change in the future. However, even with this minimal capabilities we can do smth useful (in the nearest future), in particular, in a follow-up patch i was planning to add support for "fat" (universal) binaries, and as soon as it is done (together with this patch) we will be able to build an llvm-based drop-in replacement for the tool "lipo".

alexshap updated this revision to Diff 174660.Nov 19 2018, 11:54 AM

one more typo fixed

also, @jakehehrlich, @jhenderson, @rupprecht - what would you say to moving the existing tests (llvm/test/tools/llvm-objcopy) into the subfolder ELF (llvm/test/tools/llvm-objcopy/ELF) ? if it's ok - i can do that on a separate diff.

Are either of you working on a COFF backend as well? I'm looking into that at the moment and can probably give it a try soon.

alexshap added a comment.EditedNov 19 2018, 1:45 PM

@mstorsjo, no, I'm not working on COFF, i have a few patches for MachO (basically to enable building llvm-based replacements for "lipo" and "install_name_tool"). But i'm interested in COFF as well - so please, add me to those diffs (if you don't mind)

I like the planned direction for features here! I'm going to add @echristo to this since I think he knows MachO reasonably well and would know someone who can review those aspects for us.

also, @jakehehrlich, @jhenderson, @rupprecht - what would you say to moving the existing tests (llvm/test/tools/llvm-objcopy) into the subfolder ELF (llvm/test/tools/llvm-objcopy/ELF) ? if it's ok - i can do that on a separate diff.

I'm cool with that, I think that's just the natural extension of what we've already done.

Also I'll be gone this week for Thanksgiving. I might look at this off and on but don't expect too much this week

ok, no worries, if smb has a look at it next week - that will be wonderful. Many thanks.

I'd be interested to hear yours and other people's thoughts on not calling reportError inside the MachO layer and propagating it out at least to executeObjcopyOnBinary, via llvm::Error/Expected. I know @jakehehrlich was talking about this in the ELF side at one point too.

FYI, I'm not working a COFF-layer, but it would make a lot of sense to put it in as well (but see my earlier comments about executable naming and option visibility etc).

test/tools/llvm-objcopy/MachO/real-world-input-32-copy.test
11–12 ↗(On Diff #174660)

Not that it really matters, but it would be nice to add a comment marker or similar here, e.g. '#'.

test/tools/llvm-objcopy/MachO/real-world-input-64-copy.test
11 ↗(On Diff #174660)

See above.

tools/llvm-objcopy/MachO/MachOReader.cpp
11 ↗(On Diff #174660)

Whilst it is valid, I don't particularly like relative paths in #includes. Is it possible to add an include directory to the top-level of llvm-objcopy, so that you can do #include "llvm-objcopy.h" and similarly include the MachO specific stuff as #include "MachO/SomeFile.h"

Ping @echristo Eric, could you please help review this patch / or add smb else ?

Added Jurgen who should be able to help with the MachO review (or finding a reviewer) as well.

also, @jakehehrlich, @jhenderson, @rupprecht - what would you say to moving the existing tests (llvm/test/tools/llvm-objcopy) into the subfolder ELF (llvm/test/tools/llvm-objcopy/ELF) ? if it's ok - i can do that on a separate diff.

Sounds like a good idea to me, as long as "ninja check-llvm-tools-llvm-objcopy" will still run both ELF and MachO tests. Agreed that it doesn't have to happen in this commit.

test/tools/llvm-objcopy/MachO/basic-big-endian-32-copy.test
4 ↗(On Diff #174660)

The existing ELF tests use llvm-readobj instead of obj2yaml to verify things, is it possible to use that for MachO as well?

116 ↗(On Diff #174660)

Can you also verify things coming from sections/symbol tables/etc? It seems this only tests that the FileHeader is preserved and everything else could be dropped...

I had a few bike-shed level comments. I'll dig into the code more deeply soon but my near complete lack of knowledge of MachO makes giving any actually useful feedback hard.

The high level structure looks good to me. Looks like we're avoiding lots of mistakes I made in the ELF code base which is good.

tools/llvm-objcopy/MachO/MachOObjcopy.cpp
21 ↗(On Diff #174660)

Where is Object forward declared to make this valid?

tools/llvm-objcopy/MachO/Object.h
73 ↗(On Diff #174660)

This appears to be a maximally sized version of something that already exists in BianryFormat for MachO. Can we use expanded names instead of mimicked names?

90 ↗(On Diff #174660)

Early on I took this approach for ELF as well but it wound up being a mistake for ELF. 1) Something already existed that would construct string tables for me and 2) It didn't handle sharing correctly. Do these issues not occur in MachO?

200 ↗(On Diff #174660)

For indexes would it be better to use uint64_t instead of size_t?

tools/llvm-objcopy/llvm-objcopy.cpp
130 ↗(On Diff #174660)

Do we want to be silent on mixed ELF/MachO archives? It's an absurd case that I don't suspect will ever happen it was just a thought I had. I'm fine leaving this as is.

alexshap marked an inline comment as done.Nov 29 2018, 7:52 PM
alexshap added inline comments.
tools/llvm-objcopy/MachO/MachOObjcopy.cpp
21 ↗(On Diff #174660)

both MachOReader.h and MachOWriter.h need the definition of the class Object (since they access its methods/fields), Object.h propagates to this file through those headers. Probably, I'd be better no to rely on it and include Object.h explicitly, will do.

alexshap marked an inline comment as done.Nov 29 2018, 7:58 PM
alexshap added inline comments.
tools/llvm-objcopy/MachO/Object.h
200 ↗(On Diff #174660)

MachO header (even for 64 bit) uses uint32_t for the total size of load commands => the number of load commands can not be greater than uint32_t either.

alexshap marked an inline comment as done.Nov 29 2018, 8:30 PM
alexshap added inline comments.
test/tools/llvm-objcopy/MachO/basic-big-endian-32-copy.test
116 ↗(On Diff #174660)

there is a problem with obj2yaml / yaml2obj - the support for MachO is incomplete, some things (like the content of the sections) are not dumped, maybe smth else is missing too. But in these tests (in this patch) where we copy the object files without modifications the real check is done on the line 3: cmp %t %t2,
I've added FileCheck just to be sure that yaml2obj has done the right thing, but probably I should remove it here to avoid confusions.
Regarding your other comment (about making use of llvm-readobj for tests) -
yeah, I think llvm-readobj should work for some scenarios, although with llvm-readobj there is an issue - it doesn't print all the mach-o specific bits (i.e. all the load commands).
So basically my proposal for this: in these tests cmp should be sufficient, later I will have to use yaml2obj & llvm-readobj to test the changes of the load commands, when I run into an issue that smth is not printed by the existing tools I will try to extend obj2yaml/llvm-readobj to support it.

@beanz Chris, would you have a minute to have a look at this diff ? (though this is just the beginning, so things will change)

beanz added a reviewer: mtrent.Dec 4 2018, 3:06 PM
beanz added a subscriber: mtrent.

From an Apple perspective this almost certainly needs @mtrent's eyes, as he owns our binary tools.

rupprecht added inline comments.Dec 4 2018, 4:01 PM
test/tools/llvm-objcopy/MachO/basic-big-endian-32-copy.test
116 ↗(On Diff #174660)

Sounds good to me.

Replying to this comment, but also including something from @jakehehrlich (which was on the email review thread but I can't find it in phab...)

I always regretted and disliked using yaml2obj. It works fine but honestly I prefer assembly. For ELF this means that program headers can't be tested inside of llvm since only the linker produces those. Early on I felt it critical to focus on fully linked executables and stripping them because that was the use case my team had at the time (and output to binary actually). It almost accidentally occurred that relocatable files were even supported. Today you could probably test the bulk of objcopy using the asembler and leave just a limited number of use cases to hand crafted ELF files and yaml2obj for when you're testing a valid ELF file that happens to have program headers. So my gut would be to say we should prefer using assembly to generate .o files rather than yaml2obj or uploading binaries we've built. Uploading binaries should be reserved for cases when both yaml2obj and assembly cannot produce a file that tickles the code you're trying to test which is almost always something that not even the linker can produce in my experience.

My comment about using yaml2obj + llvm-readobj isn't really about preference for those tools, just that we should have consistent tests:

  • Ideally there would only be one testing method used (e.g. every test uses yaml2obj to make the obj, run llvm-objcopy/strip, then llvm-readobj to verify it)
  • If not, there should be a priority list, e.g. generated files use assembly if possible, yaml2obj if not possible w/ assembly, and checked in object files if neither are possible; similarly verification should use llvm-readobj if possible, if not then obj2yaml, if not then cmp

I'm fine if we want to use assembly instead of yaml2obj. We should update all the tests (where possible) to do that if that's what we decide.

More explicitly, we should not have:

  • Test1 over here uses assembly because $person1 likes assembly
  • Test2 over there uses llvm IR because $person2 likes llvm IR
  • $person3 needs to fix a bug and update Test1 and Test2, and has to do it in two different ways because $person1 and $person2 have different preferences. $person3 gives up in frustration and abandons fixing the bug.

If there's any specific problems (e.g. llvm-readobj doesn't print all the fields for mach-o files), it's certainly fine to use some other tool, although it'd be nice if there's a comment/TODO saying why we aren't using the "normal" testing process, so that it could be changed later (e.g. in this case if llvm-readobj is improved w.r.t. mach-o support needed here)

I guess I don't actually have any blocking comment here, I just wanted to hijack this patch to have this discussion and we can do something independent of this patch :) Carry on!

test/tools/llvm-objcopy/MachO/real-world-input-32-copy.test
11–12 ↗(On Diff #174660)

"//" Might even be a better comment choice so that this whole file could be fed to clang++ to regenerate the object file

24 ↗(On Diff #174660)

This doesn't seem to have anything macho specific in it. Is there a way to run this command from something besides osx and generate a macho file? (-target XXX?)

beanz added a comment.Dec 4 2018, 4:49 PM

My $0.02 on the yaml discussion for tests:

The primary goal of the ObjectYAML tools for MachO was testing tools that cared more about the structure of the file than the data contained in the sections. It allowed the tools to verify load command parsing, object file layout, linkedit data parsing, and debug information (although DWARF5 support is mostly missing). At the time when I stopped working on the YAML tools, the MachO support was complete for load commands, linkedit, and debug information. Data and text section information was not preserved, although the section descriptions themselves were, so the obj->yaml->obj flow would zero out the section data resulting in an appropriately sized and laid out file.

The YAML representation for MachO differs from the ELF or COFF implementations in that it was intended to support converting a MachO generated by the compiler to YAML, round trip it back to binary, and get bit-for-bit identical output for the fields that are preserved in the yaml. This is very useful for certain types of testing (nm, objdump, debug info, etc). Because of the differences in approach between MachO's YAML tooling and the ELF/COFF yaml tooling I think they reasonably have different applications in testing.

In the cases here I'm not sure that obj2yaml | FileCheck really adds any additional value over cmp, but I do think that for MachO files, unless you really care about the section data, writing tests as yaml should be preferred over binary or assembly tests. Yaml should be preferred over binary files because binary test files are harder to understand and work with, and yaml should be preferred over assembly because it narrows the testing surface (an assembler bug resulting in objcopy failing tests would be unfortunate).

alexshap marked an inline comment as done.Dec 6 2018, 3:42 PM
alexshap added inline comments.
test/tools/llvm-objcopy/MachO/real-world-input-32-copy.test
24 ↗(On Diff #174660)

yes, i'll update the comments

Hey! I've recently pinged multiple people via e-mail (sorry about the disturbance, I apologize for all the inconveniences and I do understand that people are probably very busy given that it's the end of the half),
@mtrent, @beanz , @ributzka @lhames, @echristo, @compnerd - just give me an idea if any of you will eventually have a look at it (imo it'd be good to have people from Apple involved in the code review since this code is for MachO, who knows - maybe you will find it useful at some point - i.e. as a potential replacement/replacements for cctools in the future). If it's not super-interesting - then it's okay...., let's just make some decision if there is a way to move forward here (which you would be comfortable with) or not. (The main reason why I'm bothering you & asking this question - this diff has been around for 3+ weeks without much visible progress)

lhames added a comment.Dec 6 2018, 8:45 PM

Apart from these specific comments, my general thought is that this is fine as far as it goes, but as it grows more fully featured it seems likely to start overlapping more libObject and MC functionality.

Have you considered ways to refactor libObject/MC to allow objcopy to be built on top of them?

test/tools/llvm-objcopy/MachO/basic-big-endian-32-copy.test
116 ↗(On Diff #174660)

I always regretted and disliked using yaml2obj. It works fine but honestly I prefer assembly. For ELF this means that program headers can't be tested inside of llvm since only the linker produces those.

That seems like something that we should consider adding to yaml2obj?

In my mind, yaml2obj ought to be a tool that lets you to describe any valid object file in YAML, whereas assemblers will usually make some assumptions (e.g. load-command ordering) about how you want the object laid out.

test/tools/llvm-objcopy/MachO/real-world-input-32-copy.test
24 ↗(On Diff #174660)

Actually it would be check in the assembly for this (clang -s), then use llvm-mc to assemble it in the test case, with something like:

llvm-mc -assemble -triple x86_64-apple-macosx10.14.0 -filetype=obj -o %t %s

That should allow you to do this without checking in any binaries.

tools/llvm-objcopy/MachO/MachOReader.cpp
49–59 ↗(On Diff #174660)

Is there any reason to use templates here, rather than ordinary overloads?

tools/llvm-objcopy/MachO/MachOWriter.cpp
224–232 ↗(On Diff #174660)

Strings is a vector of std::strings and those are guaranteed to be null-terminated. You could simplify this to:

for (size_t Index = 0; Index < O.StrTable.Strings.size(); ++Index) {
  bool NullTerminate = Index + 1 != O.StrTable.Strings.size();
  memcpy(Out, O.StrTable.Strings[Index].data(),
           O.StrTable.Strings[Index].size() + NullTerminate);
  Out += O.StrTable.Strings[Index].size() + 1;
}
304–307 ↗(On Diff #174660)

This seems to be doubled up.

tools/llvm-objcopy/MachO/Object.h
90 ↗(On Diff #174660)

It didn't handle sharing correctly. Do these issues not occur in MachO?

I'm not sure. What is "sharing" in this context?

alexshap planned changes to this revision.Dec 7 2018, 11:17 AM

will address the comments and update this diff, huge thanks for the review

plotfi added a subscriber: plotfi.Dec 9 2018, 11:04 PM

Apart from these specific comments, my general thought is that this is fine as far as it goes, but as it grows more fully featured it seems likely to start overlapping more libObject and MC functionality.

Have you considered ways to refactor libObject/MC to allow objcopy to be built on top of them?

Thanks for the feedback Lang!

Much Appreciated.

PL

Apart from these specific comments, my general thought is that this is fine as far as it goes, but as it grows more fully featured it seems likely to start overlapping more libObject and MC functionality.

Have you considered ways to refactor libObject/MC to allow objcopy to be built on top of them?

It's been suggested before (https://lists.llvm.org/pipermail/llvm-dev/2018-October/126661.html), but no concrete proposals have been put forward AFAIK.
With this and D54939, llvm-objcopy is about to go from ELF to ELF+MachO+COFF. IMO, we shouldn't block either of these patches on that idea, but we should start moving bits from llvm-objcopy into libObject soon after landing both:

  1. Landing these patches sooner will give people more time to play around with MachO/COFF support in the tool (ELF support is fairly well known)
  2. Being able to refactor things once all three object formats are in will help motivate the design (i.e. won't be an ELF-centric design)
alexshap marked an inline comment as done.Dec 19 2018, 1:56 AM
alexshap added inline comments.
tools/llvm-objcopy/llvm-objcopy.cpp
130 ↗(On Diff #174660)

yeah, certainly we can check that, but doesn't seem to be worth it

alexshap updated this revision to Diff 178849.Dec 19 2018, 2:05 AM

Address comments + rename some files for consistency with coff. @lhames , @jakehehrlich - if you find a minute to have a look at this new version - that will be wonderful, many thanks in advance

jakehehrlich accepted this revision.Dec 19 2018, 1:19 PM

I think this looks pretty good. I'd like an explicit public approval from someone who knows MachO better but pending that and a few relatively minor requests I think I'm happy. I've gone over the high level structure and approve of it. I see major obvious issues in the code, it looks preety good to me. I'll defer to someone else on the MachO details. Testing looks pretty good but I don't know the full space very well so I'd like a MachO person to look at that as well.

LGTM. Please wait for a public approval from someone like @lhames before landing. I'd like such a person to comment that both the code and the testing looks good to them if possible.

test/tools/llvm-objcopy/MachO/basic-big-endian-32-copy.test
116 ↗(On Diff #174660)

Follow up on this. It's far from it in its current state and it would take quite a lot to get it there. Also consider that when testing you often want to create invalid ELF files as well. With the exception of Content, this is quite hard to accomplish with yaml2obj.

tools/llvm-objcopy/MachO/Reader.cpp
68 ↗(On Diff #178849)

How hard would it be to put this in an ArrayRef? This loop handles the case where the load command is larger than the array of sections. 1) Can we error out if that is the case 2) How tricky is it to alignFrom based on the size of SectionType? I really like to put things in ranges and first and foremost try to use an existing std:: algorithm and then a range based for loop, and only then do I consider something else. This loop doesn't neatly fit into an algorithm but it seems like a range based for loop is doable without too much issue.

84 ↗(On Diff #178849)

I should have checked the COFF code for this sort of issue a bit more than I did but I'd like to avoid this where possible. Failing on the first sign of error will slow the pace at which we can convert this into a library which is a long term plan I'd like to see become a reality.

tools/llvm-objcopy/MachO/Writer.cpp
343 ↗(On Diff #178849)

ditto. This one might require changing the interface of MachOWriter and diverge from the interface that I used in the ELF code. I just consider the way I handled fatal errors a mistake.

This revision is now accepted and ready to land.Dec 19 2018, 1:19 PM

Thanks, definitely I'll wait for @lhames

alexshap marked an inline comment as done.Dec 20 2018, 4:23 PM
alexshap added inline comments.
tools/llvm-objcopy/MachO/Reader.cpp
84 ↗(On Diff #178849)

ok, I will change this code (and the same stuff below) & update the diff.

@lhames , can we proceed here ?

@lhames , can we proceed here ?

I think he's on vacation.

@echristo, oh, i see, thanks.

mtrent accepted this revision.Jan 7 2019, 7:14 PM

The approach here seems reasonable. I have not tested these changes.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2019, 4:38 PM