This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy][MachO] Support load commands used in executables/shared libraries
ClosedPublic

Authored by seiya on Jun 16 2019, 8:45 PM.

Details

Summary

This patch implements copying some load commands that appear in executables/shared libraries such as the indirect symbol table.

I don't add tests intentionally because this patch is incomplete: we need a layout algorithm for executables/shared libraries. I'll submit it as a separate patch with tests.

Diff Detail

Repository
rL LLVM

Event Timeline

seiya created this revision.Jun 16 2019, 8:45 PM
seiya retitled this revision from [llvm-objcopy][MachO] Copy load commands used in executables and shared libraries to [llvm-objcopy][MachO] Support load commands used in executables/shared libraries.Jun 17 2019, 5:47 PM

I've subscribed @mtrent, who I believe is Mach-O experienced (sorry if I'm wrong!). He might be able to assist with the review more.

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
60 ↗(On Diff #204990)

which employed -> which are employed

(I think)

MaskRay added inline comments.
llvm/tools/llvm-objcopy/MachO/MachOReader.cpp
227 ↗(On Diff #204990)

This can be written as: arrayRefFromStringRef(MachOObj.getData().substr(...));

240 ↗(On Diff #204990)

arrayRefFromStringRef

llvm/tools/llvm-objcopy/MachO/MachOWriter.cpp
455 ↗(On Diff #204990)

emplace_back saves a pair of {}.

seiya updated this revision to Diff 205492.Jun 18 2019, 6:34 PM
seiya marked an inline comment as done.
  • Fix a comment.
  • Use arrayRefFromStringRef.
  • Use emplace_back.
seiya marked 4 inline comments as done.Jun 18 2019, 6:35 PM

sorry about the delay with the review, I've been (and probably will be over the next few weeks) quite busy recently, but I'll take a closer look at the diff tomorrow.

Regarding tests - even without computing the new layout - could we add a test which simply copies a .so and verifies that the output is the same ?

llvm/tools/llvm-objcopy/MachO/MachOReader.cpp
244 ↗(On Diff #205492)

maybe

++i
llvm/tools/llvm-objcopy/MachO/MachOWriter.h
28 ↗(On Diff #205492)

so after some thinking about what's going on here it seems to me
maybe it's possible to move the logic for building the layout from MachOWriter into a separate class MachOLayoutBuilder, basically it's motivated by "separation of concerns" / desire to simplify a bit what's going on here. Roughly speaking the idea is the following:

  1. MachOLayoutBuilder will take the model (Object) and update all the offsets / sizes. In particular, it might hold some additional state internally (like MachO::macho_load_command *LinkEditLoadCommand) and will be responsible for calling the private subroutines (computeSizeOfCmds, layoutSegments, layoutRelocations, updateDySymTab, layoutTail) in the correct order, etc. The contract here is that after calling its public method build() the layout is final and one can safely use it to calculate / get the total size of the binary, all the offsets/sizes are valid etc.
  1. MachOWriter will be responsible for managing the Buffer + writing things into it properly, but it will never modify the model (Object) directly (but internally it can make use of MachOLayoutBuilder before writing things out).

What do you think ?

llvm/tools/llvm-objcopy/MachO/MachOWriter.h
28 ↗(On Diff #205492)

in 2. i meant that it would not modify the offsets/sizes directly.

In general, i don't insist on this approach since things are quite complex here, we have to update / rebuild many other things besides the offsets/sizes. If we could split out that logic - that would be cool imo / it would enable us to unload some complexity from the writer, this was the main point of my comment.

seiya planned changes to this revision.Jun 20 2019, 2:35 AM
seiya marked an inline comment as done.

I'll separate the layout logic into a new class: MachOLayoutBuilder

llvm/tools/llvm-objcopy/MachO/MachOWriter.h
28 ↗(On Diff #205492)

That sounds good to me. I'll try isolating the layout logic into a separate class.

seiya updated this revision to Diff 205936.Jun 20 2019, 6:14 PM
  • Isolate the logic which modifies the object (e.g. updating file offsets) into MachOLayoutBuilder.cpp
seiya added a comment.Jun 20 2019, 6:20 PM

Ah, should MachOLayoutBuilder be a separate NFC patch?

seiya marked an inline comment as done.Jul 3 2019, 4:42 PM

Ping

sorry about the delay, will get to these reviews today

it's simply a very very busy season

llvm/tools/llvm-objcopy/MachO/MachOLayoutBuilder.cpp
61 ↗(On Diff #205936)

I'd be defensive here: add an assert/logical check (for example, via std::is_sorted + custom comparator)
+ probably we probably need to actually sort at some point, don't we ?

75 ↗(On Diff #205936)

nit: ++Iter

79 ↗(On Diff #205936)

nit: ++NumExtDefSymbols

llvm/tools/llvm-objcopy/MachO/MachOLayoutBuilder.cpp
55 ↗(On Diff #205936)
Symbol->Index = Index++;
75 ↗(On Diff #205936)

and the same above

seiya updated this revision to Diff 210508.Jul 18 2019, 2:57 AM
seiya marked 6 inline comments as done.

Addressed review comments.

seiya marked 2 inline comments as done.Jul 18 2019, 3:07 AM
seiya added inline comments.
llvm/tools/llvm-objcopy/MachO/MachOLayoutBuilder.cpp
61 ↗(On Diff #205936)

Yes we will need to sort symbols when we support adding symbols.

seiya marked an inline comment as done.Jul 18 2019, 3:07 AM

Thank you for your review comments, @alexshap! I've updated the patch.

the code looks good to me, but we need tests. I'd add simple tests which take YAML (similarly to how the existing tests work), generate MachO (using yaml2obj), copy it using llvm-objcopy,
dump the results (using llvm-readobj and obj2yamal) and verify that the output is what we expect. Plus those tools will parse the binary generated by llvm-objcopy thus they will verify (to some extent) that we have built a valid MachO object file.

seiya added a comment.Jul 23 2019, 8:52 PM
In D63395#1598087, @alexshap wrote:

the code looks good to me, but we need tests. I'd add simple tests which take YAML (similarly to how the existing tests work), generate MachO (using yaml2obj), copy it using llvm-objcopy,
dump the results (using llvm-readobj and obj2yamal) and verify that the output is what we expect. Plus those tools will parse the binary generated by llvm-objcopy thus they will verify (to some extent) that we have built a valid MachO object file.

I'd like to add tests in the next (and the last piece for executable support) patch which implements layout algorithm for executables/shared libraries. This patch focuses on copying some load commands (LC_DYSYMTAB, etc.) that are used in them.

This revision is now accepted and ready to land.Jul 24 2019, 12:09 AM
seiya added a comment.Jul 25 2019, 1:08 AM

Because this patch does not include any tests, I'll commit it once the next patch (which I'll upload tomorrow) gets accepted.

seiya updated this revision to Diff 215819.Aug 18 2019, 10:28 PM

Rebase for the sake of arc patch.

This revision was automatically updated to reflect the committed changes.

Ah, it looks this broke because it adds a new file to CMakeLists.txt. Just confirm, can I ignore such breakage? It will be resolved by a "gn build: Merge ..." commit.

Ah, it looks this broke because it adds a new file to CMakeLists.txt. Just confirm, can I ignore such breakage? It will be resolved by a "gn build: Merge ..." commit.

GN build bots shouldn't be sending failure notifications. If you get any from them, ignore them (so it's safe to recommit your patch), but probably worth posting on the mailing list saying that such a bot is sending them.