This is an archive of the discontinued LLVM Phabricator instance.

Regenerate LC_CODE_SIGNATURE during llvm-objcopy operations
ClosedPublic

Authored by nuriamari on Oct 5 2021, 8:26 AM.

Details

Summary

Context:

This is a second attempt at introducing signature regeneration to llvm-objcopy. In this diff: https://reviews.llvm.org/D109840, a script was introduced to test
the validity of a code signature. In this diff: https://reviews.llvm.org/D109803 (now reverted), an effort was made to extract the signature generation behavior out of LLD into a common location for use in llvm-objcopy. In this diff: https://reviews.llvm.org/D109972 it was decided that there was no appropriate common location and that a small amount of duplication to bring signature generation to llvm-objcopy would be better. This diff introduces this duplication.

Summary

Prior to this change, if a LC_CODE_SIGNATURE load command
was included in the binary passed to llvm-objcopy, the command and
associated section were simply copied and included verbatim in the
new binary. If rest of the binary was modified at all, this results
in an invalid Mach-O file. This change regenerates the signature
rather than copying it.

The code_signature_lc.test test was modified to include the yaml
representation of a small signed MachO executable in order to
effectively test the signature generation.

Diff Detail

Event Timeline

nuriamari created this revision.Oct 5 2021, 8:26 AM
Herald added a reviewer: gkm. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript
Herald added a subscriber: abrachet. · View Herald Transcript
nuriamari edited the summary of this revision. (Show Details)Oct 5 2021, 9:59 AM
nuriamari added a reviewer: drodriguez.
nuriamari updated this revision to Diff 377293.Oct 5 2021, 10:32 AM

Rename CodeSignatureSectionInfo struct

nuriamari published this revision for review.Oct 5 2021, 10:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 5 2021, 10:33 AM

I've added some in-line comments, but the high level approach looks reasonable to me, thanks for working on this!

llvm/tools/llvm-objcopy/MachO/MachOLayoutBuilder.cpp
289

nit: maybe drop "Section" from the name ?, i.e. CodeSignatureSection -> CodeSignature, CodeSignatureSectionSize -> CodeSignatueSize (it's a bit shorter + it's not a section per se).

llvm/tools/llvm-objcopy/MachO/MachOWriter.cpp
482

1/ just in case - what happens if there is no __TEXT in the binary - what does codesign do in this case - does it still create something or bail out ?
2/ is there any other place where getSegmentFileOffset / getSegmentFileSize are used ?
(if no - then probably it'd be better not to store them and just calculate here or have a helper function or method that returns them) (they also can change during some binary transformations e.g. adding / removing load commands) (this might not be observable on small tests because of the alignment)

llvm/tools/llvm-objcopy/MachO/Object.h
354–355

maybe it'd be better to make it a part of MachOLayoutBuilder ?
(since it appears to be used only there)

nuriamari updated this revision to Diff 377847.Oct 7 2021, 7:43 AM
nuriamari marked 3 inline comments as done.

Remove "Section" in variable names, move CodeSignature to MachOLayoutBuilder, move text segment helpers

nuriamari added inline comments.Oct 7 2021, 7:44 AM
llvm/tools/llvm-objcopy/MachO/MachOWriter.cpp
482

1/ Behaviour is the same as far as I can tell, checks for __TEXT segment here and if the segment isn't found, creates an "empty" OutputSegment, which has fileoff and filesize values of 0.

2/ No other usages, I will move the helpers here. I don't think getting stale values is an issue since the calls are made after the MachOLayoutBuilder updates the values.

llvm/tools/llvm-objcopy/MachO/Object.h
354–355

It is also used in the MachOWriter, but I suppose the Writer can reach into the LayoutBuilder for that information.

The symlink creates a dependency from LLVM to lld, while lld depends on LLVM ^^

The symlink creates a dependency from LLVM to lld, while lld depends on LLVM ^^

Would creating the symlink in the reverse direction be acceptable?

drodriguez added inline comments.Oct 7 2021, 9:21 AM
llvm/tools/llvm-objcopy/MachO/Object.h
354–355

@alexander-shaposhnikov: Can we talk again about this? It feels as if the rest of the LinkData blobs have their information hold in this Object struct. Most of them are simply the data itself, but for other things, like IndirectSymbolTable, it seems that they are defined here, and used in the MachOReader, the MachOLayoutBuilder and the MachOWriter. In the case of the LC_CODE_SIGNATURE, we don't have any use in MachOReader because the information is discarded, but it seems to me that this is the correct place. Additionally, to be usage from MachOWriter and MachOLayoutBuilder, it needs to end up as a public instance variable in MachOLayoutBuilder, making it the first public instance variable of that class.

In any case, it is a minimal thing, and I don't need it to be changed. I just wanted to point out the similarities with the code here.

The symlink creates a dependency from LLVM to lld, while lld depends on LLVM ^^

Would creating the symlink in the reverse direction be acceptable?

That sounds better. I don't know how happy Windows is about symlinks. I have the feeling that you will accept a bit of code duplication. How about two copies?

llvm/tools/llvm-objcopy/MachO/Object.h
354–355

the pipeline looks like this: parse the input (the output is Object) -> apply various transformations (to Object) -> write the final output (including the calculations of the new layout). Therefore, storing "offsets" in Object doesn't appear to be right, another indication why it's not right - they are not used (and should not be used (because they can be invalidated by some transformations as I pointed out in another comment)) before emitting the new object file.

llvm/tools/llvm-objcopy/MachO/Object.h
354–355

regarding public instance - unless I'm missing something this won't happen.
Currently these values are calculated in MachOLayoutBuilder::layoutTail, it's perfectly fine for layoutTail to have a write access a member of the same class. External customers can use a getter that returns a const ref.

llvm/tools/llvm-objcopy/MachO/Object.h
101

looks like in the latest revision this file has not been updated

drodriguez added inline comments.Oct 7 2021, 2:16 PM
llvm/tools/llvm-objcopy/MachO/Object.h
354–355

Understood. Thanks for explaining the concepts behind the code flow. This blob size and contents depending on the previous content is a little bit of a problem (specially the total size), and specially the size needed to be included in the __LINKEDIT information calculated during the layout.

nuriamari updated this revision to Diff 378186.Oct 8 2021, 5:42 AM
nuriamari marked 5 inline comments as done.

Copy test script, remove getSegment* declarations, add const ref getter

llvm/tools/llvm-objcopy/MachO/Object.h
302–308

would be good to move this struct declaration too (into MachOLayoutBuilder.h)

Just in case - do we have a test (for the new functionality) where the index of LC_SEGMENT* and the offset of __TEXT are modified ? (to ensure that the signature is calculated correctly in this case)

Just in case - do we have a test (for the new functionality) where the index of LC_SEGMENT* and the offset of __TEXT are modified ? (to ensure that the signature is calculated correctly in this case)

With the previous version of the code I remember checking if llvm-strip was generating the right thing and it did work (which was very nice to see). Is that something like what you asking? Do you have some existing test in mind that we can use as a starting point?

adding / removing load commands (e.g. rpath) can trigger such changes (but beware the alignment + the order of load commands matters) so the test needs to be specifically designed for these purposes

(I would use llvm-objdump or llvm-readobj to verify that the offset of __TEXT has changed etc)

P.S. btw - you probably want to update TextSegmentCommandIndex inside updateLoadCommandIndexes(..) and such a test would catch it, alternatively, it can be calculated where it's actually used (and the field can be dropped)

adding / removing load commands (e.g. rpath) can trigger such changes (but beware the alignment + the order of load commands matters) so the test needs to be specifically designed for these purposes

(I would use llvm-objdump or llvm-readobj to verify that the offset of __TEXT has changed etc)

P.S. btw - you probably want to update TextSegmentCommandIndex inside updateLoadCommandIndexes(..) and such a test would catch it, alternatively, it can be calculated where it's actually used (and the field can be dropped)

Hi Alexander,

Thanks for your feedback, sorry I've been a little inactive the last few days. I have moved the CodeSignatureInfo struct and I will add a new test like you are suggesting. I am little busy with other work at the moment, but I will try to get to it later this week.

adding / removing load commands (e.g. rpath) can trigger such changes (but beware the alignment + the order of load commands matters) so the test needs to be specifically designed for these purposes

(I would use llvm-objdump or llvm-readobj to verify that the offset of __TEXT has changed etc)

P.S. btw - you probably want to update TextSegmentCommandIndex inside updateLoadCommandIndexes(..) and such a test would catch it, alternatively, it can be calculated where it's actually used (and the field can be dropped)

OK. We have been looking at what you were pointing out in this comment, and we want to run the plan by you (and anyone else reading), so we don't detour very far.

  • Fix updateLoadCommandIndex(...) to handle both the new TextSegmentCommandIndex and also CodeSignatureCommandIndex (this is an error introduced before Nuri touched anything, but since we are fixing things, we better fix both).
  • Write a test that tries to exercise that code:
    • Create a binary with yaml2obj having a LC_CODE_SIGNATURE to start triggering all the new code.
    • Execute something like llvm-install-name-tool -prepend_rpath foo/bar. Analyze that the resulting binary has the right signature. Also check that the text segment has a new offset.
    • Execute something like llvm-install-name-tool -remove_rpath foo/bar that removes the previously prepended rpath. Do the same checks as above.

Do you think that would be enough? Were you thinking in something different?

Yeah, sounds reasonable to me, just to be clear I'd like to point out that the input binary is kinda important, it should trigger these new code paths in a meaningful way (the mentioned above offsets / load command indices should change)

Update updateLoadCommandIndexes to include TextSegment/CodeSignature commands, add test inserting load command shifting load commands around

nuriamari marked an inline comment as done.Oct 18 2021, 12:31 PM

I expanded updateLoadCommandIndexes to update the CodeSignatureCommandIndex and TextSegmentCommandIndex values. In the new test, I check that the text segment size and offset values included in the code signature header are accurate. I also added asserts to make sure TextSegmentCommandIndex does in fact point to the correct load command, and not just another segment load command with identical size and offset. I was unable to get the text segment offset to change (it remains at 0 even after inserting more commands), but I think this is expected and so long as TextSegmentCommandIndex is accurate, the rest should also be accurate.

LG (i took a look at the code, but didn't closely review the tests)

llvm/tools/llvm-objcopy/MachO/Object.cpp
32 ↗(On Diff #380494)

constexpr

This revision is now accepted and ready to land.Oct 25 2021, 1:02 AM
thevinster added inline comments.
llvm/test/tools/llvm-objcopy/MachO/code_signature_lc_update.test
69 ↗(On Diff #380494)

Just wondering here, from inspection, shouldn't this be 280? To match the lines indicating CHECK-ORIGINAL?

nuriamari updated this revision to Diff 381989.Oct 25 2021, 7:34 AM
nuriamari marked 2 inline comments as done.

Constexpr C strings

llvm/test/tools/llvm-objcopy/MachO/code_signature_lc_update.test
69 ↗(On Diff #380494)

Not necessarily since I don't think the increase to 320 is entirely a result of the new RPATH command. The datasize can often change the first time llvm-objcopy is run on a binary, even without any changes. You can see an example in code_signature_lc.test file, the datasize also inflates. I think this is just a consequence of llvm-objcopy slightly changing binaries produced by other tools. Once llvm-objcopy is run on the binary once, no changes are observed on a further runs.

This revision was landed with ongoing or failed builds.Oct 26 2021, 2:52 PM
This revision was automatically updated to reflect the committed changes.
drodriguez added inline comments.Oct 26 2021, 4:29 PM
llvm/test/tools/llvm-objcopy/MachO/code_signature_lc.test
27–35

We are aware that one of these seems not to work with macOS sed. We are looking into how to fix. If somebody feels this need to be reverted, please do.

llvm/test/tools/llvm-objcopy/MachO/code_signature_lc.test
16

just in case - wouldn't FileCheck %s --check-prefix=CHECK-COPY <%t.copy.yaml work ? (get rid of cat)

27–35

fwiw, if one day this test breaks it might be hard to debug (especially on windows), perhaps, another approach would be to replace RUN: diff
with a more granular check, e.g. it doesn't need to check all the parts of the binary

Created D112583 with a possible fix. My macOS does not fail the same as in CI, and I don't have a 10.15 machine at hand to actually check.

The change for cat is also there.

Hi,

It looks like

LLVM :: tools/llvm-objcopy/MachO/universal-object.test

starts failing with this patch when the compiler is built with ubsan:
http://lab.llvm.org:8011/#/builders/5/builds/13703

Hi,

It looks like

LLVM :: tools/llvm-objcopy/MachO/universal-object.test

starts failing with this patch when the compiler is built with ubsan:
http://lab.llvm.org:8011/#/builders/5/builds/13703

Has been failing for several days, too.

@uabelho: Thanks for pointing it out. I think I found a possible solution. It seems to me that llvm-objcopy might store data unaligned temporarily, and we did not realize.

@eugenis: Can you clarify what you mean? From my testing and the stack trace it seems to have been introduced by this change. It might have been failing in some other way that I am not aware, but after fixing the problem, it seems that the UBSan runtime error goes away.

thakis added a subscriber: thakis.Apr 22 2022, 6:02 AM

cctools's strip only resigns after stripping if the signature was linker-generated (i.e. MachO::CS_ADHOC | MachO::CS_LINKER_SIGNED are set), and otherwise emits a warning:

% cat hello.cc
#include <stdio.h>
int main() { printf("hello\n"); }
% clang hello.cc && codesign --sign - a.out && strip a.out
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/strip: warning: changes being made to the file will invalidate the code signature in: /Users/thakis/src/llvm-project/a.out
% ./a.out
zsh: killed     ./a.out. # because signature is now wrong

This makes some sense: If a binary was explicitly signed, the explicit signature will be wrong.

On the other hand, if we re-sign, the binary can at least run, so I'm not sure if we should fully adopt cctools's behavior. But maybe we should emit a warning if we re-sign a binary that doesn't have a linker-generated signature?

Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2022, 6:02 AM

In my tests with cctools I do not think I ever used a externally signed binary. I was more worried about adhoc signed binaries from the toolchain itself. I do not think adding a warning will influence our usage, so if that compatibility with cctools is needed, I will be happy to review and approve those changes.