ARM64 macOS requires that a program have an ad hoc signature, otherwise the system will immediately reject it.
Details
- Reviewers
int3 - Group Reviewers
Restricted Project - Commits
- rG151990dd94e5: [lld-macho] add code signature for native arm64 macOS
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
hm is this that much more convenient than invoking codesign after LLD finishes running in whatever build we're testing?
I think codesign -s also doesn't set the magic "codesigned by linker" bit that makes strip etc work better (and if doesn't work in cross builds from non-mac hosts). I kind of agree that putting this in as is isn't super useful.
(Don't know if you've seen it, but golang has a from scratch impl of this for 1.16 -- and there's ld64 too of course.)
ld64 signs automatically. Config & build systems of all kinds assume that linked programs are runnable. On arm64 macOS, if lld doesn't sign, then programs don't run without explicit codesign -s - ..., and every config & build system needs to be taught that. No thank you.
FYI, it is super useful for testing self-hosted native arm64 macOS builds of LLVM and other packages (e.g., GNU coreutils). If you don't want it committed to LLVM, that's fine with me. I have been using it productively as a local change and others who want to work on native arm64 macOS can place it at the base of their local diff stacks too.
(Don't know if you've seen it, but golang has a from scratch impl of this for 1.16 -- and there's ld64 too of course.)
I briefly looked at ld64-530 (the most recent Apple code drop), and could not find any code that generates LC_CODE_SIGNATURE. I will look harder this weekend.
In order to perform ad-hoc codesign, I will need to compute sha256 of the program's pages. I see no other part of LLVM that uses a sha256 library. What library should I use?
Some cmake magic to look for OpenSSL? If you can't find it, give a warning and disable code-signing?
"strip and install_name_tool are aware that what they do will invalidate a code signature, so if they’re operating on a linker-signed Mach-O file, they’ll generate a new linker-signed signature."
https://github.com/golang/go/issues/42684#issuecomment-731594886
The code-signing functionality has to end up being in LLVM eventually for llvm-install-name-tool?
LLVM tries to not depend on external libraries if possible. Given that sha256 isn't that much code (https://golang.org/src/crypto/sha256/), just adding a fresh implementation of it (to lld/MachO for now) is probably the way to go.
Agreed. There's some prior art here too ... LLVM has a SHA1 implementation in llvm/include/llvm/Support/SHA1.h and llvm/lib/Support/SHA1.cpp. That was adapted from an existing public-domain implementation, and we could do the same here (find a SHA256 implementation with a license that allows it to copied into LLVM).
df>>! In D96164#2547480, @smeenai wrote:
Agreed. There's some prior art here too ... LLVM has a SHA1 implementation in llvm/include/llvm/Support/SHA1.h and llvm/lib/Support/SHA1.cpp. That was adapted from an existing public-domain implementation, and we could do the same here (find a SHA256 implementation with a license that allows it to copied into LLVM).
Openssl has a C implementation of sha256 with Apache 2.0 license. Unless someone knows of a better candidate, I'll go with that.
Perhaps this might help: SHA256 impl for LLVM. This is based on botan's sha256 implementation and mimics the current SHA1 api design. It should be compatible with LLVM's license since it's based on botan's implementation, which is BSD-2 licensed.
Where can I learn about thsi magic "codesigned by linker" bit? My searches are coming-up dry.
I'm afraid that just copying an Apache-2.0 licensed implementation is not an option.
As described at https://llvm.org/docs/DeveloperPolicy.html#relicensing, "....In the interim, all contributions to the project will be made under the terms of both the new license and the legacy license scheme...", so code contributions must be covered by both the new and legacy license scheme.
The SPDX identifier for the new license is "Apache-2.0 WITH LLVM-exception"; for the legacy license it is "NCSA".
Therefore, just copying code that is covered by an Apache-2.0 license into the LLVM code base is not an option I'm afraid.
Apache-2.0 is a different license to "Apache-2.0 with LLVM exception" and to "NCSA".
I guess the most obvious options to get a sha256 implementation into the code base are:
- get the copyright holder of an existing implementation to contribute it to LLVM, under both the "Apache-2.0 WITH LLVM-exception" and "NCSA" license.
- write an implementation from scratch and contribute that.
@gkm Is it okay for you if I post a patch which adds a SHA-256 implementation to LLVM? I actually have a patchset ready here. But to answer your question, yes it should be possible to simply drop the files into the right locations.
@gkm Re: the linker flag. I think they were talking about this (linkerSigned):
cdir := CodeDirectory{ magic: CSMAGIC_CODEDIRECTORY, length: uint32(sz) - uint32(unsafe.Sizeof(SuperBlob{})+unsafe.Sizeof(Blob{})), version: 0x20400, flags: 0x20002, // adhoc | linkerSigned (0b10) hashOffset: uint32(hashOff), identOffset: uint32(idOff), nCodeSlots: uint32(nhashes), codeLimit: uint32(sigOff), hashSize: sha256.Size, hashType: kSecCodeSignatureHashSHA256, pageSize: uint8(pageSizeBits), execSegBase: textSeg.Offset, execSegLimit: textSeg.Filesz, }
https://go-review.googlesource.com/c/scratch/+/271866/5/cherry/codesign.go#293
flags: 0x20002, // adhoc | linkerSigned (0b10)
from cs_blobs.h
#define CS_ADHOC 0x00000002 /* ad hoc signed */ #define CS_LINKER_SIGNED 0x00020000 /* Automatically signed by the linker */
lld/MachO/SyntheticSections.cpp | ||
---|---|---|
927 | Would it help readability to change this to: |
Hmm, this is indeed very weird. I have seen "similar" behavior with codesign. codesign can actually fail. The only way to "get it working" again is to copy the file and codesign the copy, then replacing the original file (a different inode fixes codesigning).
Does the segfault happen with each linking step?
Not a segfault. SIGKILL, presumably dyld doing kill(0, SIGKILL) though I haven't yet run it under lldb to confirm.
I haven't tried, but I expect it would work. Resigning with /usr/bin/codesign works. Heck, just copying the binary causes it to work!
lld/MachO/SyntheticSections.cpp | ||
---|---|---|
927 | Those are good things. I intend to find & import headers with those enums. |
Re the copy stuff: See the CL desc and openradar link in https://reviews.llvm.org/rGba3bc2fd41b8428904fc779e353d3769074d8982
That is good to know. This case differs: LLD is not overwriting an existing file, so is not reusing an vnode.
There is something funky in FileOutputBuffer, which creates the LLD output file via mmap(2). When I force it to avoid mmap, all is well. For now, I will work-around by passing the flag F_no_mmap when signing.
Travis Hill <travis@xojo.com> fed me the answer. The Go peeps had the same problem: https://go-review.googlesource.com/c/go/+/272258
The Darwin kernel caches signature verification results by vnode. mmap(2) creates the cache entry, before we have written anything to the output. The fix is to invalidate the cache after writing via msync(..., MS_INVALIDATE);
- Invalidate signature-verification cache after writing hashes.
- replace CodeSigningTypes.h with augmented cs_blobs.h
- use appropriate constant symbols from cs_blobs.h rather than magic integers
I could use some advice regarding import of the the Apple C (not C++) header cs_blobs.h:
- Some struct decls contain zero-length array members as offset markers. These are invalid C++, and LLD doesn't need them.
- Some struct decls contain flexible array members. These are invalid C++, and LLD doesn't need them.
- Some flag bit sets are specified via sequences of #define. LLVM prefers enum.
- All struct decls have __attribute__((aligned(1))). LLD does not need or want these.
- Struct definition pattern is typedef struct __TAG { ... } NAME; C++ prefers struct NAME { ... };
- The linter complains about case style for struct names.
Questions:
- Should we make minimal edits to get proper compilation & semantics and leave all the other warts as-is?
- Or, should we be boldly invasive and make it conform to C++ and LLVM style, silencing the linter?
- What is best practice regarding header imports that require surgery? It seems wise to import the unaltered Apple header first via a preparatory commit (without a diff? unreviewed?), and then commit this diff with the changes to make it conform to C++ & LLVM style.
Feel free to ignore me, but:
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/BinaryFormat/MachO.h
seems to be inspired by the Apple headers. It is not a one-to-one copy.
This seems like a good option to me: don't import cs_blobs.h at all, but rather adapt & append the necessary codesigning definitions to existing llvm/include/llvm/BinaryFormat/MachO.h.
- What is best practice regarding header imports that require surgery? It seems wise to import the unaltered Apple header first via a preparatory commit (without a diff? unreviewed?), and then commit this diff with the changes to make it conform to C++ & LLVM style.
There are currently 0 hits for APPLE_LICENSE_HEADER in llvm's codebase. I'm guessing that's not a coincidence :)
lld/MachO/SyntheticSections.h | ||
---|---|---|
489 | constexpr? | |
491 | Since we are currently hard-wiring sha256, we should just use the appropriate constant here. | |
llvm/include/llvm/BinaryFormat/MachO.h | ||
2151 | Can we add some static_asserts for these structs to ensure that they have the correct size? (In case the compiler does weird things) |
lld/MachO/SyntheticSections.h | ||
---|---|---|
491 | Oh crap. The constant is actually private. Nvm then. Perhaps add a comment then? |
lld/MachO/SyntheticSections.h | ||
---|---|---|
491 | SHA256::HASH_LENGTH is defined as 20, which is 160 bits. I am confused. |
lld/MachO/SyntheticSections.h | ||
---|---|---|
491 |
lld/MachO/SyntheticSections.h | ||
---|---|---|
491 | Better. I had a stale version of SHA256.cpp loaded in emacs. No clue where that came from. |
lld/MachO/SyntheticSections.h | ||
---|---|---|
489 | Comments elsewhere say that static constexpr members are broken in MSC, so I avoided them. |
lld/MachO/SyntheticSections.h | ||
---|---|---|
489 | I think those comments are outdated because according to https://llvm.org/docs/GettingStarted.html#host-c-toolchain-both-compiler-and-standard-library, llvm requires >= Visual Studio 2017, which iirc properly supports constexpr. Also there are many places where llvm code utilizes "static constexpr". |
lgtm, will like @cynecx sign off
lld/MachO/SyntheticSections.cpp | ||
---|---|---|
882 | FWIW, StringRef::size() is O(1), so storing it isn't really necessary. But if you did this because it looks more uniform, that's fine too | |
910–912 | would be good to link to https://openradar.appspot.com/FB8914231 | |
917 | I think we already included this namespace at the top of the file | |
943 | JFYI, there are several other places in the code that assume that the mmap'ed buf is initialized to all zeroes. I couldn't find any docs that guarantee that behavior, but it seems to work in practice. That said it doesn't hurt to be explicit | |
lld/MachO/Writer.cpp | ||
790–791 | is this necessary? I don't think hashing a few extra zero bytes makes a difference |
lld/MachO/Writer.cpp | ||
---|---|---|
790–791 | this doesn't seem done |
lld/MachO/Writer.cpp | ||
---|---|---|
790–791 | Yes. codeEnd abuts the code-signature header, which is not hashed. "A few extra bytes" are significant, whether zero or not, and influence the hash. We stop hashing at codeEnd in order to honor the contract between signer and runtime. | |
790–791 | Oh, sorry. I was confused. This is for UUID, not code-signature. I am playing it safe here: only compute UUID for the image as it is before code signature is added. How do you prefer I handle this? |
lld/MachO/Writer.cpp | ||
---|---|---|
790–791 | Isn't codesign doing the exact same thing? So, I guess the current approach is alright. |
I won't complain if someone accepts the diff! ;)
lld/MachO/Writer.cpp | ||
---|---|---|
790–791 | The mechanisms are similar:
The precise use(s) of the UUID is unclear to me, due to lack of documentation. As best I can tell, it is used as a unique identifier for program & library content, i.e., it morphs with every code change. I do not see it used to verify content. |
lld/MachO/Writer.cpp | ||
---|---|---|
790–791 | @gkm you forgot to hit submit the first time you replied ;) yeah the UUID is just used to match .dSYM files with their corresponding executables. It doesn't matter what the value is, or how it's computed, as long as it's deterministic. So I think it's fine to just hash the whole buffer, including the unwritten codesign bits. |
constexpr?