Page MenuHomePhabricator

Regenerate LC_CODE_SIGNATURE during llvm-objcopy operations
Needs ReviewPublic

Authored by nuriamari on Tue, Oct 5, 8:26 AM.

Details

Reviewers
alexander-shaposhnikov
rupprecht
jhenderson
gkm
drodriguez
steven_wu
davide
Group Reviewers
Restricted Project
Summary

Context:

This is a second attempt at introducing signature regeneration to llvm-objcopy. In this diff: https://reviews.llvm.org/D109840, a script was introduced to test
the validity of a code signature. In this diff: https://reviews.llvm.org/D109803 (now reverted), an effort was made to extract the signature generation behavior out of LLD into a common location for use in llvm-objcopy. In this diff: https://reviews.llvm.org/D109972 it was decided that there was no appropriate common location and that a small amount of duplication to bring signature generation to llvm-objcopy would be better. This diff introduces this duplication.

Summary

Prior to this change, if a LC_CODE_SIGNATURE load command
was included in the binary passed to llvm-objcopy, the command and
associated section were simply copied and included verbatim in the
new binary. If rest of the binary was modified at all, this results
in an invalid Mach-O file. This change regenerates the signature
rather than copying it.

The code_signature_lc.test test was modified to include the yaml
representation of a small signed MachO executable in order to
effectively test the signature generation.

Diff Detail

Event Timeline

nuriamari created this revision.Tue, Oct 5, 8:26 AM
Herald added a reviewer: gkm. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript
Herald added a subscriber: abrachet. · View Herald Transcript
nuriamari edited the summary of this revision. (Show Details)Tue, Oct 5, 9:59 AM
nuriamari added a reviewer: drodriguez.
nuriamari updated this revision to Diff 377293.Tue, Oct 5, 10:32 AM

Rename CodeSignatureSectionInfo struct

nuriamari published this revision for review.Tue, Oct 5, 10:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptTue, Oct 5, 10:33 AM

I've added some in-line comments, but the high level approach looks reasonable to me, thanks for working on this!

llvm/tools/llvm-objcopy/MachO/MachOLayoutBuilder.cpp
289

nit: maybe drop "Section" from the name ?, i.e. CodeSignatureSection -> CodeSignature, CodeSignatureSectionSize -> CodeSignatueSize (it's a bit shorter + it's not a section per se).

llvm/tools/llvm-objcopy/MachO/MachOWriter.cpp
482

1/ just in case - what happens if there is no __TEXT in the binary - what does codesign do in this case - does it still create something or bail out ?
2/ is there any other place where getSegmentFileOffset / getSegmentFileSize are used ?
(if no - then probably it'd be better not to store them and just calculate here or have a helper function or method that returns them) (they also can change during some binary transformations e.g. adding / removing load commands) (this might not be observable on small tests because of the alignment)

llvm/tools/llvm-objcopy/MachO/Object.h
354–355

maybe it'd be better to make it a part of MachOLayoutBuilder ?
(since it appears to be used only there)

nuriamari updated this revision to Diff 377847.Thu, Oct 7, 7:43 AM
nuriamari marked 3 inline comments as done.

Remove "Section" in variable names, move CodeSignature to MachOLayoutBuilder, move text segment helpers

nuriamari added inline comments.Thu, Oct 7, 7:44 AM
llvm/tools/llvm-objcopy/MachO/MachOWriter.cpp
482

1/ Behaviour is the same as far as I can tell, checks for __TEXT segment here and if the segment isn't found, creates an "empty" OutputSegment, which has fileoff and filesize values of 0.

2/ No other usages, I will move the helpers here. I don't think getting stale values is an issue since the calls are made after the MachOLayoutBuilder updates the values.

llvm/tools/llvm-objcopy/MachO/Object.h
354–355

It is also used in the MachOWriter, but I suppose the Writer can reach into the LayoutBuilder for that information.

The symlink creates a dependency from LLVM to lld, while lld depends on LLVM ^^

The symlink creates a dependency from LLVM to lld, while lld depends on LLVM ^^

Would creating the symlink in the reverse direction be acceptable?

drodriguez added inline comments.Thu, Oct 7, 9:21 AM
llvm/tools/llvm-objcopy/MachO/Object.h
354–355

@alexander-shaposhnikov: Can we talk again about this? It feels as if the rest of the LinkData blobs have their information hold in this Object struct. Most of them are simply the data itself, but for other things, like IndirectSymbolTable, it seems that they are defined here, and used in the MachOReader, the MachOLayoutBuilder and the MachOWriter. In the case of the LC_CODE_SIGNATURE, we don't have any use in MachOReader because the information is discarded, but it seems to me that this is the correct place. Additionally, to be usage from MachOWriter and MachOLayoutBuilder, it needs to end up as a public instance variable in MachOLayoutBuilder, making it the first public instance variable of that class.

In any case, it is a minimal thing, and I don't need it to be changed. I just wanted to point out the similarities with the code here.

The symlink creates a dependency from LLVM to lld, while lld depends on LLVM ^^

Would creating the symlink in the reverse direction be acceptable?

That sounds better. I don't know how happy Windows is about symlinks. I have the feeling that you will accept a bit of code duplication. How about two copies?

llvm/tools/llvm-objcopy/MachO/Object.h
354–355

the pipeline looks like this: parse the input (the output is Object) -> apply various transformations (to Object) -> write the final output (including the calculations of the new layout). Therefore, storing "offsets" in Object doesn't appear to be right, another indication why it's not right - they are not used (and should not be used (because they can be invalidated by some transformations as I pointed out in another comment)) before emitting the new object file.

llvm/tools/llvm-objcopy/MachO/Object.h
354–355

regarding public instance - unless I'm missing something this won't happen.
Currently these values are calculated in MachOLayoutBuilder::layoutTail, it's perfectly fine for layoutTail to have a write access a member of the same class. External customers can use a getter that returns a const ref.

llvm/tools/llvm-objcopy/MachO/Object.h
101

looks like in the latest revision this file has not been updated

drodriguez added inline comments.Thu, Oct 7, 2:16 PM
llvm/tools/llvm-objcopy/MachO/Object.h
354–355

Understood. Thanks for explaining the concepts behind the code flow. This blob size and contents depending on the previous content is a little bit of a problem (specially the total size), and specially the size needed to be included in the __LINKEDIT information calculated during the layout.

nuriamari updated this revision to Diff 378186.Fri, Oct 8, 5:42 AM
nuriamari marked 5 inline comments as done.

Copy test script, remove getSegment* declarations, add const ref getter

llvm/tools/llvm-objcopy/MachO/Object.h
302–308

would be good to move this struct declaration too (into MachOLayoutBuilder.h)

Just in case - do we have a test (for the new functionality) where the index of LC_SEGMENT* and the offset of __TEXT are modified ? (to ensure that the signature is calculated correctly in this case)

Just in case - do we have a test (for the new functionality) where the index of LC_SEGMENT* and the offset of __TEXT are modified ? (to ensure that the signature is calculated correctly in this case)

With the previous version of the code I remember checking if llvm-strip was generating the right thing and it did work (which was very nice to see). Is that something like what you asking? Do you have some existing test in mind that we can use as a starting point?

adding / removing load commands (e.g. rpath) can trigger such changes (but beware the alignment + the order of load commands matters) so the test needs to be specifically designed for these purposes

(I would use llvm-objdump or llvm-readobj to verify that the offset of __TEXT has changed etc)

P.S. btw - you probably want to update TextSegmentCommandIndex inside updateLoadCommandIndexes(..) and such a test would catch it, alternatively, it can be calculated where it's actually used (and the field can be dropped)

adding / removing load commands (e.g. rpath) can trigger such changes (but beware the alignment + the order of load commands matters) so the test needs to be specifically designed for these purposes

(I would use llvm-objdump or llvm-readobj to verify that the offset of __TEXT has changed etc)

P.S. btw - you probably want to update TextSegmentCommandIndex inside updateLoadCommandIndexes(..) and such a test would catch it, alternatively, it can be calculated where it's actually used (and the field can be dropped)

Hi Alexander,

Thanks for your feedback, sorry I've been a little inactive the last few days. I have moved the CodeSignatureInfo struct and I will add a new test like you are suggesting. I am little busy with other work at the moment, but I will try to get to it later this week.

adding / removing load commands (e.g. rpath) can trigger such changes (but beware the alignment + the order of load commands matters) so the test needs to be specifically designed for these purposes

(I would use llvm-objdump or llvm-readobj to verify that the offset of __TEXT has changed etc)

P.S. btw - you probably want to update TextSegmentCommandIndex inside updateLoadCommandIndexes(..) and such a test would catch it, alternatively, it can be calculated where it's actually used (and the field can be dropped)

OK. We have been looking at what you were pointing out in this comment, and we want to run the plan by you (and anyone else reading), so we don't detour very far.

  • Fix updateLoadCommandIndex(...) to handle both the new TextSegmentCommandIndex and also CodeSignatureCommandIndex (this is an error introduced before Nuri touched anything, but since we are fixing things, we better fix both).
  • Write a test that tries to exercise that code:
    • Create a binary with yaml2obj having a LC_CODE_SIGNATURE to start triggering all the new code.
    • Execute something like llvm-install-name-tool -prepend_rpath foo/bar. Analyze that the resulting binary has the right signature. Also check that the text segment has a new offset.
    • Execute something like llvm-install-name-tool -remove_rpath foo/bar that removes the previously prepended rpath. Do the same checks as above.

Do you think that would be enough? Were you thinking in something different?

Yeah, sounds reasonable to me, just to be clear I'd like to point out that the input binary is kinda important, it should trigger these new code paths in a meaningful way (the mentioned above offsets / load command indices should change)