Page MenuHomePhabricator

[lld-macho] add code signature for native arm64 macOS
ClosedPublic

Authored by gkm on Feb 5 2021, 11:08 AM.

Details

Reviewers
int3
Group Reviewers
Restricted Project
Commits
rG151990dd94e5: [lld-macho] add code signature for native arm64 macOS
Summary

ARM64 macOS requires that a program have an ad hoc signature, otherwise the system will immediately reject it.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
gkm added a comment.Feb 13 2021, 1:59 PM
  • drop the fork+exec of /usr/bin/codesign and rewrite in C++
gkm updated this revision to Diff 323573.Feb 13 2021, 2:04 PM
  • remove SHA256, which is added separately in D96540
gkm edited the summary of this revision. (Show Details)Feb 13 2021, 2:16 PM
gkm edited the summary of this revision. (Show Details)Feb 13 2021, 2:18 PM
gkm edited the summary of this revision. (Show Details)
gkm edited the summary of this revision. (Show Details)
gkm edited the summary of this revision. (Show Details)
gkm edited the summary of this revision. (Show Details)Feb 13 2021, 2:21 PM
tschuett added inline comments.Feb 14 2021, 6:46 AM
lld/MachO/SyntheticSections.cpp
948

Would it help readability to change this to:
CS_ADHOC | CS_LINKER_SIGNED ?

Does codesign.go make any differences?

gkm edited the summary of this revision. (Show Details)Feb 14 2021, 12:02 PM
gkm edited the summary of this revision. (Show Details)Feb 14 2021, 12:05 PM

Hmm, this is indeed very weird. I have seen "similar" behavior with codesign. codesign can actually fail. The only way to "get it working" again is to copy the file and codesign the copy, then replacing the original file (a different inode fixes codesigning).

Does the segfault happen with each linking step?

gkm added a comment.Feb 14 2021, 2:29 PM

Hmm, this is indeed very weird. I have seen "similar" behavior with codesign. codesign can actually fail. The only way to "get it working" again is to copy the file and codesign the copy, then replacing the original file (a different inode fixes codesigning).

Does the segfault happen with each linking step?

Not a segfault. SIGKILL, presumably dyld doing kill(0, SIGKILL) though I haven't yet run it under lldb to confirm.

gkm marked an inline comment as done.Feb 14 2021, 5:12 PM

Does codesign.go make any differences?

I haven't tried, but I expect it would work. Resigning with /usr/bin/codesign works. Heck, just copying the binary causes it to work!

lld/MachO/SyntheticSections.cpp
948

Those are good things. I intend to find & import headers with those enums.

gkm marked an inline comment as done.EditedFeb 14 2021, 8:54 PM

Re the copy stuff: See the CL desc and openradar link in https://reviews.llvm.org/rGba3bc2fd41b8428904fc779e353d3769074d8982

That is good to know. This case differs: LLD is not overwriting an existing file, so is not reusing an vnode.

There is something funky in FileOutputBuffer, which creates the LLD output file via mmap(2). When I force it to avoid mmap, all is well. For now, I will work-around by passing the flag F_no_mmap when signing.

gkm added a comment.Feb 14 2021, 9:54 PM
In D96164#2562701, @gkm wrote:

Re the copy stuff: See the CL desc and openradar link in https://reviews.llvm.org/rGba3bc2fd41b8428904fc779e353d3769074d8982

That is good to know. This case differs: LLD is not overwriting an existing file, so is not reusing an vnode. There is something funky in FileOutputBuffer, which creates the LLD output file via mmap(2).

Travis Hill <travis@xojo.com> fed me the answer. The Go peeps had the same problem: https://go-review.googlesource.com/c/go/+/272258
The Darwin kernel caches signature verification results by vnode. mmap(2) creates the cache entry, before we have written anything to the output. The fix is to invalidate the cache after writing via msync(..., MS_INVALIDATE);

gkm edited the summary of this revision. (Show Details)Feb 14 2021, 9:54 PM
gkm updated this revision to Diff 323665.Feb 14 2021, 10:34 PM
  • Invalidate signature-verification cache after writing hashes.
  • replace CodeSigningTypes.h with augmented cs_blobs.h
  • use appropriate constant symbols from cs_blobs.h rather than magic integers
In D96164#2562720, @gkm wrote:
In D96164#2562701, @gkm wrote:

Re the copy stuff: See the CL desc and openradar link in https://reviews.llvm.org/rGba3bc2fd41b8428904fc779e353d3769074d8982

That is good to know. This case differs: LLD is not overwriting an existing file, so is not reusing an vnode. There is something funky in FileOutputBuffer, which creates the LLD output file via mmap(2).

Travis Hill <travis@xojo.com> fed me the answer. The Go peeps had the same problem: https://go-review.googlesource.com/c/go/+/272258
The Darwin kernel caches signature verification results by vnode. mmap(2) creates the cache entry, before we have written anything to the output. The fix is to invalidate the cache after writing via msync(..., MS_INVALIDATE);

Ah, the mmap one is https://openradar.appspot.com/FB8914231 :)

gkm added a comment.Feb 15 2021, 9:31 AM

I could use some advice regarding import of the the Apple C (not C++) header cs_blobs.h:

  • Some struct decls contain zero-length array members as offset markers. These are invalid C++, and LLD doesn't need them.
  • Some struct decls contain flexible array members. These are invalid C++, and LLD doesn't need them.
  • Some flag bit sets are specified via sequences of #define. LLVM prefers enum.
  • All struct decls have __attribute__((aligned(1))). LLD does not need or want these.
  • Struct definition pattern is typedef struct __TAG { ... } NAME; C++ prefers struct NAME { ... };
  • The linter complains about case style for struct names.

Questions:

  • Should we make minimal edits to get proper compilation & semantics and leave all the other warts as-is?
  • Or, should we be boldly invasive and make it conform to C++ and LLVM style, silencing the linter?
  • What is best practice regarding header imports that require surgery? It seems wise to import the unaltered Apple header first via a preparatory commit (without a diff? unreviewed?), and then commit this diff with the changes to make it conform to C++ & LLVM style.

Feel free to ignore me, but:
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/BinaryFormat/MachO.h
seems to be inspired by the Apple headers. It is not a one-to-one copy.

gkm added a comment.Feb 15 2021, 10:09 AM

https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/BinaryFormat/MachO.h
seems to be inspired by the Apple headers. It is not a one-to-one copy.

This seems like a good option to me: don't import cs_blobs.h at all, but rather adapt & append the necessary codesigning definitions to existing llvm/include/llvm/BinaryFormat/MachO.h.

  • What is best practice regarding header imports that require surgery? It seems wise to import the unaltered Apple header first via a preparatory commit (without a diff? unreviewed?), and then commit this diff with the changes to make it conform to C++ & LLVM style.

There are currently 0 hits for APPLE_LICENSE_HEADER in llvm's codebase. I'm guessing that's not a coincidence :)

gkm updated this revision to Diff 324048.Feb 16 2021, 10:24 AM
  • import code signing structs & enums into llvm/include/llvm/BinaryFormat/MachO.h
cynecx added inline comments.Feb 18 2021, 11:09 AM
lld/MachO/SyntheticSections.h
490

constexpr?

492

Since we are currently hard-wiring sha256, we should just use the appropriate constant here.

llvm/include/llvm/BinaryFormat/MachO.h
2151

Can we add some static_asserts for these structs to ensure that they have the correct size? (In case the compiler does weird things)

cynecx added inline comments.Feb 18 2021, 11:15 AM
lld/MachO/SyntheticSections.h
492

Oh crap. The constant is actually private. Nvm then. Perhaps add a comment then?

gkm added inline comments.Feb 18 2021, 11:36 AM
lld/MachO/SyntheticSections.h
492

SHA256::HASH_LENGTH is defined as 20, which is 160 bits. I am confused.

gkm added inline comments.Feb 18 2021, 12:09 PM
lld/MachO/SyntheticSections.h
492

Better. I had a stale version of SHA256.cpp loaded in emacs. No clue where that came from.

gkm marked 4 inline comments as done.Feb 19 2021, 7:01 PM
gkm added inline comments.
lld/MachO/SyntheticSections.h
490

Comments elsewhere say that static constexpr members are broken in MSC, so I avoided them.

cynecx added inline comments.Feb 20 2021, 8:56 AM
lld/MachO/SyntheticSections.h
490

I think those comments are outdated because according to https://llvm.org/docs/GettingStarted.html#host-c-toolchain-both-compiler-and-standard-library, llvm requires >= Visual Studio 2017, which iirc properly supports constexpr. Also there are many places where llvm code utilizes "static constexpr".

gkm updated this revision to Diff 325231.Feb 20 2021, 12:44 PM
gkm marked an inline comment as done.
  • revise according to review feedback
int3 added a comment.Feb 20 2021, 11:03 PM

lgtm, will like @cynecx sign off

lld/MachO/SyntheticSections.cpp
903

FWIW, StringRef::size() is O(1), so storing it isn't really necessary. But if you did this because it looks more uniform, that's fine too

931–933
938

I think we already included this namespace at the top of the file

964

JFYI, there are several other places in the code that assume that the mmap'ed buf is initialized to all zeroes. I couldn't find any docs that guarantee that behavior, but it seems to work in practice. That said it doesn't hurt to be explicit

lld/MachO/Writer.cpp
796–797

is this necessary? I don't think hashing a few extra zero bytes makes a difference

gkm updated this revision to Diff 325314.Feb 21 2021, 9:36 AM
gkm marked 5 inline comments as done.
  • revise according to review feedback
gkm updated this revision to Diff 325316.Feb 21 2021, 9:42 AM
  • set version as ... s/0x20400/CS_SUPPORTSEXECSEG/
int3 added inline comments.Feb 21 2021, 9:56 AM
lld/MachO/Writer.cpp
796–797

this doesn't seem done

gkm marked an inline comment as done.Feb 21 2021, 11:56 AM
gkm added inline comments.
lld/MachO/Writer.cpp
796–797

Yes. codeEnd abuts the code-signature header, which is not hashed. "A few extra bytes" are significant, whether zero or not, and influence the hash. We stop hashing at codeEnd in order to honor the contract between signer and runtime.

796–797

Oh, sorry. I was confused. This is for UUID, not code-signature. I am playing it safe here: only compute UUID for the image as it is before code signature is added.

How do you prefer I handle this?

cynecx added inline comments.Feb 21 2021, 12:29 PM
lld/MachO/Writer.cpp
796–797

Isn't codesign doing the exact same thing? So, I guess the current approach is alright.

gkm marked 2 inline comments as done.Feb 21 2021, 1:31 PM

I won't complain if someone accepts the diff! ;)

lld/MachO/Writer.cpp
796–797

The mechanisms are similar:

  • LC_UUID has a minimal header and signs with md5
  • LC_CODE_SIGNATURE has a more elaborate header and signs with sha256

The precise use(s) of the UUID is unclear to me, due to lack of documentation. As best I can tell, it is used as a unique identifier for program & library content, i.e., it morphs with every code change. I do not see it used to verify content.

int3 added inline comments.Feb 21 2021, 1:49 PM
lld/MachO/Writer.cpp
796–797

@gkm you forgot to hit submit the first time you replied ;)

yeah the UUID is just used to match .dSYM files with their corresponding executables. It doesn't matter what the value is, or how it's computed, as long as it's deterministic. So I think it's fine to just hash the whole buffer, including the unwritten codesign bits.

gkm marked 2 inline comments as done.Feb 21 2021, 7:21 PM
gkm updated this revision to Diff 325363.Feb 21 2021, 7:21 PM
  • revise according to review feedback
gkm updated this revision to Diff 325373.Feb 21 2021, 10:30 PM
  • wrap #if defined(__APPLE__) around <sys/mman.h> and msync()
int3 accepted this revision.Feb 22 2021, 8:35 AM

lgtm

This revision is now accepted and ready to land.Feb 22 2021, 8:35 AM
gkm updated this revision to Diff 326161.Feb 24 2021, 11:37 AM
  • expand some comments, improve some alignment expressions
gkm updated this revision to Diff 326245.Feb 24 2021, 5:05 PM
  • final alignment & comment tweaks
This revision was landed with ongoing or failed builds.Feb 24 2021, 5:05 PM
This revision was automatically updated to reflect the committed changes.

Did this land without any tests?

Did this land without any tests?

Yes, this was an oversight. Thanks for adding a test in D97994.