This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] add code signature for native arm64 macOS
ClosedPublic

Authored by gkm on Feb 5 2021, 11:08 AM.

Details

Reviewers
int3
Group Reviewers
Restricted Project
Commits
rG151990dd94e5: [lld-macho] add code signature for native arm64 macOS
Summary

ARM64 macOS requires that a program have an ad hoc signature, otherwise the system will immediately reject it.

Diff Detail

Event Timeline

gkm created this revision.Feb 5 2021, 11:08 AM
gkm requested review of this revision.Feb 5 2021, 11:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2021, 11:08 AM
int3 added a comment.Feb 5 2021, 6:16 PM

hm is this that much more convenient than invoking codesign after LLD finishes running in whatever build we're testing?

thakis added a subscriber: thakis.Feb 5 2021, 6:42 PM

I think codesign -s also doesn't set the magic "codesigned by linker" bit that makes strip etc work better (and if doesn't work in cross builds from non-mac hosts). I kind of agree that putting this in as is isn't super useful.

(Don't know if you've seen it, but golang has a from scratch impl of this for 1.16 -- and there's ld64 too of course.)

gkm added a comment.Feb 5 2021, 11:35 PM

hm is this that much more convenient than invoking codesign after LLD finishes running in whatever build we're testing?

ld64 signs automatically. Config & build systems of all kinds assume that linked programs are runnable. On arm64 macOS, if lld doesn't sign, then programs don't run without explicit codesign -s - ..., and every config & build system needs to be taught that. No thank you.

I think codesign -s also doesn't set the magic "codesigned by linker" bit that makes strip etc work better (and if doesn't work in cross builds from non-mac hosts). I kind of agree that putting this in as is isn't super useful.

FYI, it is super useful for testing self-hosted native arm64 macOS builds of LLVM and other packages (e.g., GNU coreutils). If you don't want it committed to LLVM, that's fine with me. I have been using it productively as a local change and others who want to work on native arm64 macOS can place it at the base of their local diff stacks too.

(Don't know if you've seen it, but golang has a from scratch impl of this for 1.16 -- and there's ld64 too of course.)

I briefly looked at ld64-530 (the most recent Apple code drop), and could not find any code that generates LC_CODE_SIGNATURE. I will look harder this weekend.

gkm added a comment.Feb 6 2021, 5:08 PM

In order to perform ad-hoc codesign, I will need to compute sha256 of the program's pages. I see no other part of LLVM that uses a sha256 library. What library should I use?

tschuett added a comment.EditedFeb 7 2021, 12:04 AM

Some cmake magic to look for OpenSSL? If you can't find it, give a warning and disable code-signing?

"strip and install_name_tool are aware that what they do will invalidate a code signature, so if they’re operating on a linker-signed Mach-O file, they’ll generate a new linker-signed signature."
https://github.com/golang/go/issues/42684#issuecomment-731594886

The code-signing functionality has to end up being in LLVM eventually for llvm-install-name-tool?

In D96164#2547158, @gkm wrote:

In order to perform ad-hoc codesign, I will need to compute sha256 of the program's pages. I see no other part of LLVM that uses a sha256 library. What library should I use?

LLVM tries to not depend on external libraries if possible. Given that sha256 isn't that much code (https://golang.org/src/crypto/sha256/), just adding a fresh implementation of it (to lld/MachO for now) is probably the way to go.

In D96164#2547158, @gkm wrote:

In order to perform ad-hoc codesign, I will need to compute sha256 of the program's pages. I see no other part of LLVM that uses a sha256 library. What library should I use?

LLVM tries to not depend on external libraries if possible. Given that sha256 isn't that much code (https://golang.org/src/crypto/sha256/), just adding a fresh implementation of it (to lld/MachO for now) is probably the way to go.

Agreed. There's some prior art here too ... LLVM has a SHA1 implementation in llvm/include/llvm/Support/SHA1.h and llvm/lib/Support/SHA1.cpp. That was adapted from an existing public-domain implementation, and we could do the same here (find a SHA256 implementation with a license that allows it to copied into LLVM).

gkm added a comment.EditedFeb 7 2021, 10:01 PM

df>>! In D96164#2547480, @smeenai wrote:

LLVM tries to not depend on external libraries if possible. Given that sha256 isn't that much code (https://golang.org/src/crypto/sha256/), just adding a fresh implementation of it (to lld/MachO for now) is probably the way to go.

Agreed. There's some prior art here too ... LLVM has a SHA1 implementation in llvm/include/llvm/Support/SHA1.h and llvm/lib/Support/SHA1.cpp. That was adapted from an existing public-domain implementation, and we could do the same here (find a SHA256 implementation with a license that allows it to copied into LLVM).

Openssl has a C implementation of sha256 with Apache 2.0 license. Unless someone knows of a better candidate, I'll go with that.

cynecx added a subscriber: cynecx.Feb 10 2021, 8:26 AM
This comment was removed by cynecx.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2021, 8:26 AM

Perhaps this might help: SHA256 impl for LLVM. This is based on botan's sha256 implementation and mimics the current SHA1 api design. It should be compatible with LLVM's license since it's based on botan's implementation, which is BSD-2 licensed.

gkm added a comment.Feb 10 2021, 5:51 PM

I think codesign -s also doesn't set the magic "codesigned by linker" bit that makes strip etc work better (and if doesn't work in cross builds from non-mac hosts). I kind of agree that putting this in as is isn't super useful.

Where can I learn about thsi magic "codesigned by linker" bit? My searches are coming-up dry.

In D96164#2547858, @gkm wrote:

df>>! In D96164#2547480, @smeenai wrote:

LLVM tries to not depend on external libraries if possible. Given that sha256 isn't that much code (https://golang.org/src/crypto/sha256/), just adding a fresh implementation of it (to lld/MachO for now) is probably the way to go.

Agreed. There's some prior art here too ... LLVM has a SHA1 implementation in llvm/include/llvm/Support/SHA1.h and llvm/lib/Support/SHA1.cpp. That was adapted from an existing public-domain implementation, and we could do the same here (find a SHA256 implementation with a license that allows it to copied into LLVM).

Openssl has a C implementation of sha256 with Apache 2.0 license. Unless someone knows of a better candidate, I'll go with that.

I'm afraid that just copying an Apache-2.0 licensed implementation is not an option.

As described at https://llvm.org/docs/DeveloperPolicy.html#relicensing, "....In the interim, all contributions to the project will be made under the terms of both the new license and the legacy license scheme...", so code contributions must be covered by both the new and legacy license scheme.
The SPDX identifier for the new license is "Apache-2.0 WITH LLVM-exception"; for the legacy license it is "NCSA".

Therefore, just copying code that is covered by an Apache-2.0 license into the LLVM code base is not an option I'm afraid.
Apache-2.0 is a different license to "Apache-2.0 with LLVM exception" and to "NCSA".
I guess the most obvious options to get a sha256 implementation into the code base are:

  1. get the copyright holder of an existing implementation to contribute it to LLVM, under both the "Apache-2.0 WITH LLVM-exception" and "NCSA" license.
  2. write an implementation from scratch and contribute that.
gkm added a comment.Feb 11 2021, 10:37 AM

Perhaps this might help: SHA256 impl for LLVM. This is based on botan's sha256 implementation and mimics the current SHA1 api design. It should be compatible with LLVM's license since it's based on botan's implementation, which is BSD-2 licensed.

This is super helpful! Is it ready to import into LLVM as-is alongside SHA1 ?

@gkm Is it okay for you if I post a patch which adds a SHA-256 implementation to LLVM? I actually have a patchset ready here. But to answer your question, yes it should be possible to simply drop the files into the right locations.

cynecx added a comment.EditedFeb 11 2021, 12:25 PM

@gkm Re: the linker flag. I think they were talking about this (linkerSigned):

	cdir := CodeDirectory{
		magic:        CSMAGIC_CODEDIRECTORY,
		length:       uint32(sz) - uint32(unsafe.Sizeof(SuperBlob{})+unsafe.Sizeof(Blob{})),
		version:      0x20400,
		flags:        0x20002, // adhoc | linkerSigned (0b10)
		hashOffset:   uint32(hashOff),
		identOffset:  uint32(idOff),
		nCodeSlots:   uint32(nhashes),
		codeLimit:    uint32(sigOff),
		hashSize:     sha256.Size,
		hashType:     kSecCodeSignatureHashSHA256,
		pageSize:     uint8(pageSizeBits),
		execSegBase:  textSeg.Offset,
		execSegLimit: textSeg.Filesz,
	}

https://go-review.googlesource.com/c/scratch/+/271866/5/cherry/codesign.go#293

flags:        0x20002, // adhoc | linkerSigned (0b10)

from cs_blobs.h

#define CS_ADHOC                    0x00000002  /* ad hoc signed */
#define CS_LINKER_SIGNED            0x00020000  /* Automatically signed by the linker */
cynecx added a comment.EditedFeb 11 2021, 1:23 PM

SHA256 patch is up here: https://reviews.llvm.org/D96540.

gkm updated this revision to Diff 323572.Feb 13 2021, 1:59 PM
gkm added a comment.Feb 13 2021, 1:59 PM
  • drop the fork+exec of /usr/bin/codesign and rewrite in C++
gkm updated this revision to Diff 323573.Feb 13 2021, 2:04 PM
  • remove SHA256, which is added separately in D96540
gkm edited the summary of this revision. (Show Details)Feb 13 2021, 2:16 PM
gkm edited the summary of this revision. (Show Details)Feb 13 2021, 2:18 PM
gkm edited the summary of this revision. (Show Details)
gkm edited the summary of this revision. (Show Details)
gkm edited the summary of this revision. (Show Details)
gkm edited the summary of this revision. (Show Details)Feb 13 2021, 2:21 PM
tschuett added inline comments.Feb 14 2021, 6:46 AM
lld/MachO/SyntheticSections.cpp
925 ↗(On Diff #323573)

Would it help readability to change this to:
CS_ADHOC | CS_LINKER_SIGNED ?

Does codesign.go make any differences?

gkm edited the summary of this revision. (Show Details)Feb 14 2021, 12:02 PM
gkm edited the summary of this revision. (Show Details)Feb 14 2021, 12:05 PM

Hmm, this is indeed very weird. I have seen "similar" behavior with codesign. codesign can actually fail. The only way to "get it working" again is to copy the file and codesign the copy, then replacing the original file (a different inode fixes codesigning).

Does the segfault happen with each linking step?

gkm added a comment.Feb 14 2021, 2:29 PM

Hmm, this is indeed very weird. I have seen "similar" behavior with codesign. codesign can actually fail. The only way to "get it working" again is to copy the file and codesign the copy, then replacing the original file (a different inode fixes codesigning).

Does the segfault happen with each linking step?

Not a segfault. SIGKILL, presumably dyld doing kill(0, SIGKILL) though I haven't yet run it under lldb to confirm.

gkm marked an inline comment as done.Feb 14 2021, 5:12 PM

Does codesign.go make any differences?

I haven't tried, but I expect it would work. Resigning with /usr/bin/codesign works. Heck, just copying the binary causes it to work!

lld/MachO/SyntheticSections.cpp
925 ↗(On Diff #323573)

Those are good things. I intend to find & import headers with those enums.

Re the copy stuff: See the CL desc and openradar link in https://reviews.llvm.org/rGba3bc2fd41b8428904fc779e353d3769074d8982

gkm marked an inline comment as done.EditedFeb 14 2021, 8:54 PM

Re the copy stuff: See the CL desc and openradar link in https://reviews.llvm.org/rGba3bc2fd41b8428904fc779e353d3769074d8982

That is good to know. This case differs: LLD is not overwriting an existing file, so is not reusing an vnode.

There is something funky in FileOutputBuffer, which creates the LLD output file via mmap(2). When I force it to avoid mmap, all is well. For now, I will work-around by passing the flag F_no_mmap when signing.

gkm added a comment.Feb 14 2021, 9:54 PM
In D96164#2562701, @gkm wrote:

Re the copy stuff: See the CL desc and openradar link in https://reviews.llvm.org/rGba3bc2fd41b8428904fc779e353d3769074d8982

That is good to know. This case differs: LLD is not overwriting an existing file, so is not reusing an vnode. There is something funky in FileOutputBuffer, which creates the LLD output file via mmap(2).

Travis Hill <travis@xojo.com> fed me the answer. The Go peeps had the same problem: https://go-review.googlesource.com/c/go/+/272258
The Darwin kernel caches signature verification results by vnode. mmap(2) creates the cache entry, before we have written anything to the output. The fix is to invalidate the cache after writing via msync(..., MS_INVALIDATE);

gkm edited the summary of this revision. (Show Details)Feb 14 2021, 9:54 PM
gkm updated this revision to Diff 323665.Feb 14 2021, 10:34 PM
  • Invalidate signature-verification cache after writing hashes.
  • replace CodeSigningTypes.h with augmented cs_blobs.h
  • use appropriate constant symbols from cs_blobs.h rather than magic integers
In D96164#2562720, @gkm wrote:
In D96164#2562701, @gkm wrote:

Re the copy stuff: See the CL desc and openradar link in https://reviews.llvm.org/rGba3bc2fd41b8428904fc779e353d3769074d8982

That is good to know. This case differs: LLD is not overwriting an existing file, so is not reusing an vnode. There is something funky in FileOutputBuffer, which creates the LLD output file via mmap(2).

Travis Hill <travis@xojo.com> fed me the answer. The Go peeps had the same problem: https://go-review.googlesource.com/c/go/+/272258
The Darwin kernel caches signature verification results by vnode. mmap(2) creates the cache entry, before we have written anything to the output. The fix is to invalidate the cache after writing via msync(..., MS_INVALIDATE);

Ah, the mmap one is https://openradar.appspot.com/FB8914231 :)

gkm added a comment.Feb 15 2021, 9:31 AM

I could use some advice regarding import of the the Apple C (not C++) header cs_blobs.h:

  • Some struct decls contain zero-length array members as offset markers. These are invalid C++, and LLD doesn't need them.
  • Some struct decls contain flexible array members. These are invalid C++, and LLD doesn't need them.
  • Some flag bit sets are specified via sequences of #define. LLVM prefers enum.
  • All struct decls have __attribute__((aligned(1))). LLD does not need or want these.
  • Struct definition pattern is typedef struct __TAG { ... } NAME; C++ prefers struct NAME { ... };
  • The linter complains about case style for struct names.

Questions:

  • Should we make minimal edits to get proper compilation & semantics and leave all the other warts as-is?
  • Or, should we be boldly invasive and make it conform to C++ and LLVM style, silencing the linter?
  • What is best practice regarding header imports that require surgery? It seems wise to import the unaltered Apple header first via a preparatory commit (without a diff? unreviewed?), and then commit this diff with the changes to make it conform to C++ & LLVM style.

Feel free to ignore me, but:
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/BinaryFormat/MachO.h
seems to be inspired by the Apple headers. It is not a one-to-one copy.

gkm added a comment.Feb 15 2021, 10:09 AM

https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/BinaryFormat/MachO.h
seems to be inspired by the Apple headers. It is not a one-to-one copy.

This seems like a good option to me: don't import cs_blobs.h at all, but rather adapt & append the necessary codesigning definitions to existing llvm/include/llvm/BinaryFormat/MachO.h.

  • What is best practice regarding header imports that require surgery? It seems wise to import the unaltered Apple header first via a preparatory commit (without a diff? unreviewed?), and then commit this diff with the changes to make it conform to C++ & LLVM style.

There are currently 0 hits for APPLE_LICENSE_HEADER in llvm's codebase. I'm guessing that's not a coincidence :)

gkm updated this revision to Diff 324048.Feb 16 2021, 10:24 AM
  • import code signing structs & enums into llvm/include/llvm/BinaryFormat/MachO.h
cynecx added inline comments.Feb 18 2021, 11:09 AM
lld/MachO/SyntheticSections.h
489 ↗(On Diff #324048)

constexpr?

491 ↗(On Diff #324048)

Since we are currently hard-wiring sha256, we should just use the appropriate constant here.

llvm/include/llvm/BinaryFormat/MachO.h
2151 ↗(On Diff #324048)

Can we add some static_asserts for these structs to ensure that they have the correct size? (In case the compiler does weird things)

cynecx added inline comments.Feb 18 2021, 11:15 AM
lld/MachO/SyntheticSections.h
491 ↗(On Diff #324048)

Oh crap. The constant is actually private. Nvm then. Perhaps add a comment then?

gkm added inline comments.Feb 18 2021, 11:36 AM
lld/MachO/SyntheticSections.h
491 ↗(On Diff #324048)

SHA256::HASH_LENGTH is defined as 20, which is 160 bits. I am confused.

gkm added inline comments.Feb 18 2021, 12:09 PM
lld/MachO/SyntheticSections.h
491 ↗(On Diff #324048)

Better. I had a stale version of SHA256.cpp loaded in emacs. No clue where that came from.

gkm marked 4 inline comments as done.Feb 19 2021, 7:01 PM
gkm added inline comments.
lld/MachO/SyntheticSections.h
489 ↗(On Diff #324048)

Comments elsewhere say that static constexpr members are broken in MSC, so I avoided them.

cynecx added inline comments.Feb 20 2021, 8:56 AM
lld/MachO/SyntheticSections.h
489 ↗(On Diff #324048)

I think those comments are outdated because according to https://llvm.org/docs/GettingStarted.html#host-c-toolchain-both-compiler-and-standard-library, llvm requires >= Visual Studio 2017, which iirc properly supports constexpr. Also there are many places where llvm code utilizes "static constexpr".

gkm updated this revision to Diff 325231.Feb 20 2021, 12:44 PM
gkm marked an inline comment as done.
  • revise according to review feedback
int3 added a comment.Feb 20 2021, 11:03 PM

lgtm, will like @cynecx sign off

lld/MachO/SyntheticSections.cpp
882 ↗(On Diff #325231)

FWIW, StringRef::size() is O(1), so storing it isn't really necessary. But if you did this because it looks more uniform, that's fine too

910–912 ↗(On Diff #325231)
917 ↗(On Diff #325231)

I think we already included this namespace at the top of the file

943 ↗(On Diff #325231)

JFYI, there are several other places in the code that assume that the mmap'ed buf is initialized to all zeroes. I couldn't find any docs that guarantee that behavior, but it seems to work in practice. That said it doesn't hurt to be explicit

lld/MachO/Writer.cpp
764–765

is this necessary? I don't think hashing a few extra zero bytes makes a difference

gkm updated this revision to Diff 325314.Feb 21 2021, 9:36 AM
gkm marked 5 inline comments as done.
  • revise according to review feedback
gkm updated this revision to Diff 325316.Feb 21 2021, 9:42 AM
  • set version as ... s/0x20400/CS_SUPPORTSEXECSEG/
int3 added inline comments.Feb 21 2021, 9:56 AM
lld/MachO/Writer.cpp
764–765

this doesn't seem done

gkm marked an inline comment as done.Feb 21 2021, 11:56 AM
gkm added inline comments.
lld/MachO/Writer.cpp
764–765

Yes. codeEnd abuts the code-signature header, which is not hashed. "A few extra bytes" are significant, whether zero or not, and influence the hash. We stop hashing at codeEnd in order to honor the contract between signer and runtime.

764–765

Oh, sorry. I was confused. This is for UUID, not code-signature. I am playing it safe here: only compute UUID for the image as it is before code signature is added.

How do you prefer I handle this?

cynecx added inline comments.Feb 21 2021, 12:29 PM
lld/MachO/Writer.cpp
764–765

Isn't codesign doing the exact same thing? So, I guess the current approach is alright.

gkm marked 2 inline comments as done.Feb 21 2021, 1:31 PM

I won't complain if someone accepts the diff! ;)

lld/MachO/Writer.cpp
764–765

The mechanisms are similar:

  • LC_UUID has a minimal header and signs with md5
  • LC_CODE_SIGNATURE has a more elaborate header and signs with sha256

The precise use(s) of the UUID is unclear to me, due to lack of documentation. As best I can tell, it is used as a unique identifier for program & library content, i.e., it morphs with every code change. I do not see it used to verify content.

int3 added inline comments.Feb 21 2021, 1:49 PM
lld/MachO/Writer.cpp
764–765

@gkm you forgot to hit submit the first time you replied ;)

yeah the UUID is just used to match .dSYM files with their corresponding executables. It doesn't matter what the value is, or how it's computed, as long as it's deterministic. So I think it's fine to just hash the whole buffer, including the unwritten codesign bits.

gkm marked 2 inline comments as done.Feb 21 2021, 7:21 PM
gkm updated this revision to Diff 325363.Feb 21 2021, 7:21 PM
  • revise according to review feedback
gkm updated this revision to Diff 325373.Feb 21 2021, 10:30 PM
  • wrap #if defined(__APPLE__) around <sys/mman.h> and msync()
int3 accepted this revision.Feb 22 2021, 8:35 AM

lgtm

This revision is now accepted and ready to land.Feb 22 2021, 8:35 AM
gkm updated this revision to Diff 326161.Feb 24 2021, 11:37 AM
  • expand some comments, improve some alignment expressions
gkm updated this revision to Diff 326245.Feb 24 2021, 5:05 PM
  • final alignment & comment tweaks
This revision was landed with ongoing or failed builds.Feb 24 2021, 5:05 PM
This revision was automatically updated to reflect the committed changes.
thakis added a comment.Mar 4 2021, 4:56 PM

Did this land without any tests?

Did this land without any tests?

Yes, this was an oversight. Thanks for adding a test in D97994.