This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Object/
-
llvm/
-
Object/
-
MachO.h
-
lib/Object/
-
Object/
-
CodeSignatureSection.cpp
-
test/tools/llvm-objcopy/MachO/
-
tools/
-
llvm-objcopy/
-
MachO/
-
Inputs/
-
code-signature-check.py
-
code_signature_lc.test
-
tools/llvm-objcopy/MachO/
-
llvm-objcopy/
-
MachO/
1/1
MachOLayoutBuilder.h
-
MachOLayoutBuilder.cpp
-
MachOObjcopy.cpp
-
MachOReader.cpp
-
MachOWriter.h
1/10
MachOWriter.cpp
1/1
Object.h
-
Object.cpp

Differential D109972

Regenerate LC_CODE_SIGNATURE during llvm-objcopy operations
AbandonedPublic

Authored by drodriguez on Sep 17 2021, 9:02 AM.

Download Raw Diff

Details

Reviewers

alexander-shaposhnikov
rupprecht
jhenderson
int3
steven_wu
davide
nuriamari

Summary

Prior to this change, if a LC_CODE_SIGNATURE load command
was included in the binary passed to llvm-objcopy, the command and
associated section were simply copied and included verbatim in the
new binary. If rest of the binary was modified at all, this results
in an invalid Mach-O file. This change regenerates the signature
rather than copying it.

The code_signature_lc.test test was modified to include the yaml
representation of a small signed MachO executable in order to
effectively test the signature generation.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nuriamari created this revision.Sep 17 2021, 9:02 AM

Herald added a reviewer: alexander-shaposhnikov. · View Herald TranscriptSep 17 2021, 9:02 AM

Herald added a reviewer: rupprecht. · View Herald Transcript

Herald added a reviewer: jhenderson. · View Herald Transcript

Herald added a subscriber: abrachet. · View Herald Transcript

Fix variable name style, clarify CHECK prefix

nuriamari added reviewers: int3, drodriguez.Sep 17 2021, 9:17 AM

nuriamari published this revision for review.Sep 17 2021, 9:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 17 2021, 9:33 AM

Herald added subscribers: llvm-commits, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B124419: Diff 373249.Sep 17 2021, 10:13 AM

Correct output file name resolution for windows

Herald added a subscriber: hiraditya. · View Herald TranscriptSep 17 2021, 2:33 PM

Harbormaster completed remote builds in B124483: Diff 373338.Sep 17 2021, 3:12 PM

This comment has been deleted.

llvm/tools/llvm-objcopy/MachO/MachOWriter.cpp
441	It looks like I've missed the previous diff https://reviews.llvm.org/D109803, sorry about being late. 1/ MachO.h is probably not the ideal place for this functionality, according to the top file comment "MachO.h declares the MachOObjectFile class". My first impression is that the public declaration probably should be placed into a separate file, e.g. "MachOCodeSignature.h" (and we need a detailed comment explaining what this class or function does) 2/ The responsibility of the class CodeSignatureSection seems to be somewhat unclear. Perhaps, we don't really need a class here at all. What would you say to just creating a function, e.g. std::string buildCodeSignatureStab(...) or void writeCodeSignatureStab(...) with some clear separation of the input / output and removing the unrelated pieces of functionality (e.g. msync on the buffer, this is probably not suitable for a library function) ? (and not expose the implementation details in the header file). 3/ Some minor comments, e.g. stripOutputFilePath - I suspect Suport/Path.h might already have some helper utilities What I would recommend here - can we (temporarily) take a step back and reopen & revert D109803, if it were not a library interface I would not be so worried, but for some common and frequently used libraries like libObject I would highly suggest we should clean up the interface first and I'm more than happy to help.

drodriguez added inline comments.Sep 17 2021, 6:53 PM

llvm/tools/llvm-objcopy/MachO/MachOWriter.cpp
441	Hi Alex, We would prefer to fix forward instead of reverting, if possible. As it sits, D109803 haven't changed anything and it is just "dormant" code as far as LLVM respects. Only this diff uses that functionality. I will review your comment next week more carefully and provide better answers. 1/ I think we can do this. It should not be a problem. 2/ The class tries to imitate the existing code in LLD, which was a class. There's two related functionalities to the class: measuring the size, and serializing. For both one needs the same inputs, so it was interesting to tie data and behaviour together. 2b/ `msync` on the buffer is actually important, and `llvm-objcopy` might not work unless we make that call. IIRC without that, if one was writing the same file, and that file was mmap'ed into memory, the kernel will not realize that the signature had changed, and will not evaluate again. I can look for a reference that explained the problem in a little bit more detail. 3/ We realize that it was not portable for Windows, and Nuri changed that this afternoon to use `llvm::path` instead. This diff has the fix, since the previous one was already landed. Thanks for the help offering. Any feedback is welcome. We will try to get to the best version of this that satisfies everyone's requirements.

alexander-shaposhnikov added inline comments.Sep 17 2021, 8:32 PM

llvm/tools/llvm-objcopy/MachO/MachOWriter.cpp
441	Hi! Many thanks! My 0.02$ . The current design looks a little bit suboptimal, especially for a library, so I suggest let's improve the interface first and put things into the right place. It's a bit hard to post comments outside of the code context, but yeah, I'm late to the party. The LLD-specific details ("CodeSignatureSection") can still live in the LLD codebase and make use of the function from libObject. If there are no significant performance differences I would be biased towards readability / simplicity, e.g. one can consider something like this (I didn't verify the details, so maybe some parameters are unnecessary or, instead, missing) SmallVector<char, 64> buildCodeSignatureStab( uint64_t SignatureOffset, MachO::HeaderFileType FileType, StringRef FilePath, uint64_t TextSegmentOffset, StringRef TextSegmentContent) and in LLD the method CodeSignatureSection::finalize would call this function - would that be sufficient ? (maybe I'm missing something, feel free to correct me). (Plus in this case we will avoid having two different CodeSignatureSection classes) and (mostly motivated by the "separation of concerns") I'd leave the unfortunate workaround with msync on the application's side, hopefully it'll be gone in the future.

Some stylistic comments. I'll leave @alexander-shaposhnikov to do the real review.

llvm/tools/llvm-objcopy/MachO/MachOLayoutBuilder.h
21	We don't usually bother marking member variables as `const` in the llvm-objcopy code, so I'd drop the `const` here and in the constructor parameter argument.
llvm/tools/llvm-objcopy/MachO/MachOWriter.cpp
10	You should be able to include this as in the inline edit (cf ELF/ELFObjcopy.cpp which doesn't have the relative include). Same goes elsewhere.
llvm/tools/llvm-objcopy/MachO/Object.h
101	Nit: comments must end in a full stop. Same in the two other cases below.

@jhenderson: thanks for the feedback, we will apply those when we figure out the best shape for these changes.

llvm/tools/llvm-objcopy/MachO/MachOWriter.cpp
441	About the ideal place for this code: We thought about libObject and specifically MachO.h because it seemed like the best place, but we are fine relocating the code if necessary. We have also seen libBinaryFormat which includes some support for MachO and which might be a good place for these. The basic structs for the code signature (`CS_CodeDirectory`) is there, so it might be a good idea to keep everything together. Our only requirement is that it should be accessible to both llvm-objcopy and LLD to avoid code duplication. These are not LLD specific details, as far as we understand, but part of how MachO binaries are put together and every tool that creates a MachO binary will need to play by these rules (they apply to each slice independently, so in our testing Universal MachO does not need changes, but there might be edge cases we are not aware). About the shape of the code: We don’t mind removing the class and having a couple of free functions instead, but there is the possibility of the responsibilities of the class growing bigger in the future, and we would like to avoid functions with a lot of parameters (there’s already 4 that both the serialization and the calculation of the size would need to receive). About your proposal of `buildCodeSignatureStab`, is there an example of that pattern already in the code. We have looked for similar constructs and we cannot find it. Also, being “Stab” a term of art related to dSYM, maybe it is not the best name to use (alternatives can be something like “Blob”, or use “serialize” instead of “build”). Also, using `SmallVector` might be limited for this usage since the actual signature depends on the size of the binary, and for each 4 KB of binary we need an extra 32 bytes. I have seen signatures being 15 KB for LLVM binaries. We would prefer the versions that write directly into memory to avoid excessive copy of data. Finally, about `msync`: While we can move the `msync` calls into the application code, and add a note in the documentation of the method that the caller must ensure that `msync` gets invoked, we think it is better to keep the `msync` only in one place with a clear explanation of why it is necessary. The details are in https://openradar.appspot.com/FB8914231 but in short, when mmaping an executable, the kernel seems to cache the results of the signature verification and might not check for the validity of the binary again, even if the signature gets regenerated. This is very typical while iterating on a binary, and creating the binary over and over, so it is something that can happen in both LLD and llvm-objcopy (specially in llvm-strip). Apple might fix their side eventually, but until then we need the workaround everywhere, so we think a central place is better.

alexander-shaposhnikov requested changes to this revision.Sep 20 2021, 1:43 PM

alexander-shaposhnikov added subscribers: mtrent, lattner, steven_wu.

alexander-shaposhnikov added inline comments.

llvm/tools/llvm-objcopy/MachO/MachOWriter.cpp
441	is there an example of that pattern already in the code. We have looked for similar constructs and we cannot find it. If I'm not mistaken libObject was initially designed to present a read-only view of object files and closely models the binary formats, so, in general, what was added in https://reviews.llvm.org/D109803 does not have many analogs or counterparts there (to the best of my knowledge), although libObject contains some functionality to write archives and universal binaries. I find the interface introduced in https://reviews.llvm.org/D109803 quite confusing and still think it would be better to revisit that decision (and revert for now) before it has propagated further. This can be subjective so I'm happy if somebody else familiar with libObject weighs in, maybe I'm missing something. If I am not mistaken this code doesn't use the signer's identity in any way so it's more like a collection of hashes. That's why above I called it a "stab". I think it also indicates that we need a detailed comment (in the source code) that should clarify these aspects. These are not LLD specific details, as far as we understand, but part of how MachO binaries LLD uses the term "section" to model various parts of the LINKEDIT segment, however, the binary format does not use this term (to the best of my knowledge) for this purpose. The class object::CodeSignatureSection looks very similar to what was initially implemented in LLD but I'm not sure it's a good fit for libObject, this is what I was referring to as "LLD-specific details" above. Subscribing more people who are familiar with libObject & Mach-O: cc @mtrent , @lattner, @steven_wu

This revision now requires changes to proceed.Sep 20 2021, 1:43 PM

alexander-shaposhnikov added a reviewer: steven_wu.Sep 20 2021, 1:44 PM

steven_wu added a reviewer: davide.Sep 21 2021, 11:50 AM

+ @davide who might have more context of LC_CODE_SIGNITURE section

I totally agree that the API/Implementation you copied from lld is very not libObject. There should not be a write function inside libObject (you can reimplement it inside llvm-copy or other utilities).
I am also not quite sure what is your usecase for llvm-objcopy? That is not a tool available for Darwin and there is no guarantee that a objcopy of the binary can still run (could be a bincompat problem).

drodriguez added inline comments.Sep 21 2021, 12:07 PM

llvm/tools/llvm-objcopy/MachO/MachOWriter.cpp
441	Seems fair that `libObject` is mainly for reading files. In `libBinaryFormat` is were the some of the struct used by `LC_CODE_SIGNATURE` are defined. There are small functions in there for MachO (mostly getters, but some setters). Would that be a good place? It also has a `MsgPackWriter`, which seems mostly unrelated, but seems to show that "writers" are OK in that library. About being confusing, we are fine changing the couple of methods into free functions if that allow us to advance further. We chose a class because it was already that shape in LLD, and it made sense to keep that design. Would that be OK for everyone? It would be one function for measuring size, and one function to write the headers of the data contained in `LC_CODE_SIGNATURE` and calculate the appropriate hashes. About the name "code signing": you are completely right that this does not use the signer identity at all and it is just a bunch of hashes. I think it should be understood more as a binary "signature", than a cryptographic signature of the code. It is mostly an HMAC, but without the cryptographic pieces. From what I understand it is just a way for the linker to output "verified" binaries in Apple Silicon Macs without having to setup signing identities and stuff like that. We will add more comments to the code to try to explain better the purpose and which functionality is implemented. About the term "section": Yes, LLD implements the linkedit segment with synthetic sections, and that name did slip, and I did not realize what you were referring to. These are not what MachO understands as sections. We will change that and probably make it clear that it is related to the `CS_CodeDirectory`, `CS_SuperBlob` and `CS_BlobIndex`. In summary, if everyone agrees: Move the functionality outside `libObject` and probably into `libBinaryFormat`, where the struct `CS_CodeDirectory` is already living. Transform the functionality into a couple of free functions that receive all the necessary information. Do not mention "Section" as related to any of this functionality. Also try to not mention "code signature" that much, if a better name is available. Also the feedback provided by James Henderson above. We hope the plan sounds good to everyone. Let us know if someone has more feedback that we can integrate.

steven_wu added inline comments.Sep 21 2021, 1:17 PM

llvm/tools/llvm-objcopy/MachO/MachOWriter.cpp
441	I don't think `libBinaryFormat` is a good place to implement write. I think a good design should be following: libObject (CodeSignatureSection) knows how to read the section and generations section with data CodeSignatureSection can return generated section data in a buffer if it has to. Binary write should be implemented inside tools (like llvm-objcopy, lld or yaml2obj). That is not really a code duplication.

@steven_wu: Thanks for the feedback. I think at this point we are completely conviced that libObject was not the right place. We are trying to find a good place, and we would like, as much as possible to not duplicate the same code or logic in two tools. It is not a lot of code right now, but the potential divergence in the future scares us a little.

The usecase for llvm-objcopy is that when trying to use llvm-strip on binaries linked by ld64 or lld the signature is currently copied verbatim, which makes the binary invalid in Apple Silicon machines. The changes here recalculates the signature in order to make the binary valid again. This seems to be what happens when using Xcode's own strip (which is not based on LLVM), and it is something that we wanted to replicate in order to keep being compatible. I am not sure about what you refer to with the "bincompat problem", but if you elaborate, I can try to figure out if there will be a problem that we haven't had into account.

llvm/tools/llvm-objcopy/MachO/MachOWriter.cpp
441	@steven_wu: Thanks for the input. Let me try to see if we understand each other correctly. Neither LLD, nor llvm-objcopy are not really interested in "reading" the contents of the data from the `LC_CODE_SIGNATURE`. I would not like to design an API that has no users. If we needed to read from the section, and after Alexander explanations above, we now know that the right place is `libObject`, and we will add the code there, if necessary. To generate the data one needs access to the fully layout binary (slice). The process involves hashing every page of the binary (4 KB) and appending a hash of the contents. The data "header" can be written without access to the binary contents, but one still needs some information (mainly about offsets). The binary write that we have tried to abstract from the tools still require the tools to provide a buffer where the data is written (and a buffer where the binary contents are read, to generate the hashes). It was a happy coincidence that both tools worked with buffers, but if at some point in the future the writting mechanism have to be abstracted away, it should still be possible. I would not like to have the same process in each of the tools, if possible, since there is tricky pieces that would be bad to have in two parts of the code base (this was the main reason of bringing the LLD implementation into LLVM: we didn't want to simply duplicate the code). We are fine moving the `msync` out of this and into the tools, if that's necessary to make everyone happy. We do not think it is the right decision, but we are fine making that change.

In D109972#3013610, @drodriguez wrote:

This seems to be what happens when using Xcode's own strip (which is not based on LLVM), and it is something that we wanted to replicate in order to keep being compatible.

I changed Apple's strip to call out to an os-supplied library present in the Xcode toolchain in order to build the code directory section for Apple Silicon. I believe Apple's linker is using this same strategy. The os-supplied library is now a dylib, which allows the details of the code directory to change without modifying the linker or strip, and it means the details may change between versions or maybe even be different between toolchains. I don't know how often this changes in practice.

That's something you should keep in mind while reverse-engineering this data structure for objcopy.

@mtrent: I noticed that change around the time of Xcode 12 (thanks!). However, even if the headers were available, I don't think LLVM would accept it as a dependency. What we are trying to do here is similar, with a common piece of code that handles the generation of these parts of the MachO binary. In the future, if it needs to be modified for some reason, all the tools should enjoy the improvements after a recompilation, instead of chasing down every tool implementation and modifying them.

I think there are a few considerations here, so to help unblock this effort below I'll try to summarize my understanding/perspective.

1/ Regarding the code placement - if currently there is no good place for it with a deep sigh I'd be okay to have a version of it in objcopy. This duplication is not ideal, to me it somehow feels ok to have it in some library
(e.g. ArchiveWriter is in libObject and it's used in multiple places), but I would recommend not to place it into the existing headers that essentially document the binary format and are not directly related to the new functionality.
But if there is no good place / or there are strong objections - okay, it's important to keep the public interfaces in a good shape.

2/ The blob ("signature" or "signature stab" (not sure which name is better here)) itself doesn't appear to be parsed anywhere, so, perhaps, at the moment no new parsing code is necessary in libObject.
If I understand correctly what this diff and the previous one try to accomplish - they try to add the functionality to generate this blob. I think having it in-tree has some benefits
(though as @mtrent has pointed out it's fragile), the main one - it allows the tools to be hosted pretty much anywhere (e.g. on Windows or Linux)
(similarly to other utilities, e.g. llvm-objdump, llvm-readobj, etc), the downside is that it can break if/when the format changes (but if I understand correctly we are kind of already living with it in LLD).

3/ After looking at the implementation the main concerns (mentioned above) are the following: from the readability perspective it seems suboptimal to mix together input and output
and use a "hybrid" approach to parameters passing.
It's also not obvious from the declaration that the buffer (passed to CodeSignatureSection::write(...)) must contain "a partially constructed mach-o object", that's why the interface looked kinda complicated
and anyone reading this code will have to dig into the actual implementation details.
The main work is done by the method CodeSignatureSection::write(...), so it seemed natural to have a function that either simply returns the blob (preferred way), or, if it's performance-critical (need numbers)
writes the content in-place, but I'd try to make the signature more intuitive (e.g. it should be clear to the reader what the input/output are).
If we end up introducing a class it would be good to clean up the interface (the same considerations as above),
hide the implementation details and probably use an appropriate name (e.g. CodeSignatureBuilder or <your suggestion is here>).

P.S. unlike StringTableBuilder here the whole thing is constructed in a single step, that's why my first impression was that a simpler solution should suffice, but I do not insist on it.
P.P.S. the current code makes sense in LLD's context, but the situation changes once we step away from it.

A follow up: after some discussion internally, we have decided that the best way forward should be the following:

Revert the previous commit as soon as possible.
Abandon this commit.
We will try to figure out doing the modifications locally in objcopy and try to add enough notices to point people towards modifying LLD at the same time to avoid divergences. Hopefully when the code needs to be added to a third tool (like install-name-tool or similar), we can figure out a way of reducing duplication or sharing more common pieces.
We are aware of the problems of fragility with the code. As you point out, those were already present in LLD. The only good thing we can say is that, from looking at the versioning, it seems to be well-thought about backwards compatibility, so we can only hope that it will work for some time before we have to play catch up again.
We will try to make it more obvious with better documentation, better naming, and better function signatures that this processing needs a full Mach-O binary in the buffer and what is going to happen to it. We were probably influenced by trying to understand how the code worked in LLD that we did not see that it might not have been easy to understand for someone without that experience.

llvm/tools/llvm-objcopy/MachO/MachOWriter.cpp
441	@steven_wu: Thanks for you feedback. You can read the next steps we plan to take in the other comment. I think many of the points will not make sense in the new approach, but if you think something is important, or you want to provide commentary to my points above, we are happy to hear your feedback.

drodriguez mentioned this in D110974: Revert "Extract LC_CODE_SIGNATURE related implementation out of LLD".Oct 1 2021, 2:34 PM

Reverted the previous one in https://reviews.llvm.org/D110974

drodriguez abandoned this revision.Oct 1 2021, 2:35 PM

drodriguez marked 3 inline comments as done.

smeenai added a subscriber: smeenai.Oct 1 2021, 2:38 PM

In D109972#3013964, @drodriguez wrote:

@mtrent: I noticed that change around the time of Xcode 12 (thanks!). However, even if the headers were available, I don't think LLVM would accept it as a dependency. What we are trying to do here is similar, with a common piece of code that handles the generation of these parts of the MachO binary. In the future, if it needs to be modified for some reason, all the tools should enjoy the improvements after a recompilation, instead of chasing down every tool implementation and modifying them.

That's fine. I'm just explaining that the code directory format is not part of/defined by the Mach-O file format.

drodriguez mentioned this in rG657f02d45804: Revert "Extract LC_CODE_SIGNATURE related implementation out of LLD".Oct 1 2021, 5:21 PM

nuriamari mentioned this in D111164: Regenerate LC_CODE_SIGNATURE during llvm-objcopy operations.Oct 5 2021, 9:59 AM

drodriguez mentioned this in rGa299b24712cc: Regenerate LC_CODE_SIGNATURE during llvm-objcopy operations.Oct 26 2021, 2:52 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Object/

MachO.h

2 lines

lib/

Object/

CodeSignatureSection.cpp

12 lines

test/

tools/

llvm-objcopy/

MachO/

Inputs/

code-signature-check.py

1 line

code_signature_lc.test

253 lines

tools/

llvm-objcopy/

MachO/

MachOLayoutBuilder.h

7 lines

MachOLayoutBuilder.cpp

23 lines

3 lines

11 lines

8 lines

22 lines

9 lines

24 lines

Diff 373338

llvm/include/llvm/Object/MachO.h

Show First 20 Lines • Show All 753 Lines • ▼ Show 20 Lines	public:

void write(uint8_t *Buf) const;		void write(uint8_t *Buf) const;

private:		private:
uint32_t getAllHeadersSize() const;		uint32_t getAllHeadersSize() const;
uint32_t getBlockCount() const;		uint32_t getBlockCount() const;
uint32_t getFileNamePad() const;		uint32_t getFileNamePad() const;

StringRef stripOutputFilePath(const StringRef OutputFilePath);

// FileOff is the offset relative to the start of the file		// FileOff is the offset relative to the start of the file
// used to access the start of code signature section		// used to access the start of code signature section
// in __LINKEDIT segment		// in __LINKEDIT segment
uint64_t FileOff;		uint64_t FileOff;
StringRef OutputFileName;		StringRef OutputFileName;
MachO::HeaderFileType OutputFileType;		MachO::HeaderFileType OutputFileType;
uint64_t TextSegmentFileOff;		uint64_t TextSegmentFileOff;
uint64_t TextSegmentFileSize;		uint64_t TextSegmentFileSize;
};		};

} // end namespace object		} // end namespace object
} // end namespace llvm		} // end namespace llvm

#endif // LLVM_OBJECT_MACHO_H		#endif // LLVM_OBJECT_MACHO_H

llvm/lib/Object/CodeSignatureSection.cpp

	//===- CodeSignatureSection.cpp - CodeSignatureSection class definition ---===//			//===- CodeSignatureSection.cpp - CodeSignatureSection class definition ---===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file defines the CodeSignatureSection class			// This file defines the CodeSignatureSection class
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "llvm/BinaryFormat/MachO.h"			#include "llvm/BinaryFormat/MachO.h"
	#include "llvm/Object/MachO.h"			#include "llvm/Object/MachO.h"
	#include "llvm/Support/Endian.h"			#include "llvm/Support/Endian.h"
				#include "llvm/Support/Path.h"
	#include "llvm/Support/SHA256.h"			#include "llvm/Support/SHA256.h"
	#include <cassert>			#include <cassert>

	#if defined(__APPLE__)			#if defined(__APPLE__)
	#include <sys/mman.h>			#include <sys/mman.h>
	#endif			#endif

	using namespace llvm;			using namespace llvm;
	using namespace object;			using namespace object;
	using namespace support::endian;			using namespace support::endian;

	static_assert((CodeSignatureSection::BlobHeadersSize % 8) == 0, "");			static_assert((CodeSignatureSection::BlobHeadersSize % 8) == 0, "");
	static_assert((CodeSignatureSection::FixedHeadersSize % 8) == 0, "");			static_assert((CodeSignatureSection::FixedHeadersSize % 8) == 0, "");

	CodeSignatureSection::CodeSignatureSection(uint64_t FileOff,			CodeSignatureSection::CodeSignatureSection(uint64_t FileOff,
	StringRef OutputFilePath,			StringRef OutputFilePath,
	MachO::HeaderFileType OutputFileType,			MachO::HeaderFileType OutputFileType,
	uint64_t TextSegmentFileOff,			uint64_t TextSegmentFileOff,
	uint64_t TextSegmentFileSize)			uint64_t TextSegmentFileSize)
	: FileOff{FileOff}, OutputFileName{stripOutputFilePath(OutputFilePath)},			: FileOff{FileOff}, OutputFileName{sys::path::filename(OutputFilePath)},
	OutputFileType{OutputFileType}, TextSegmentFileOff{TextSegmentFileOff},			OutputFileType{OutputFileType}, TextSegmentFileOff{TextSegmentFileOff},
	TextSegmentFileSize{TextSegmentFileSize} {}			TextSegmentFileSize{TextSegmentFileSize} {}

	StringRef
	CodeSignatureSection::stripOutputFilePath(const StringRef OutputFilePath) {
	const size_t LastSlashIndex = OutputFilePath.rfind("/");
	if (LastSlashIndex == std::string::npos)
	return OutputFilePath;

	return OutputFilePath.drop_front(LastSlashIndex + 1);
	}

	uint32_t CodeSignatureSection::getAllHeadersSize() const {			uint32_t CodeSignatureSection::getAllHeadersSize() const {
	return alignTo<Align>(FixedHeadersSize + OutputFileName.size() + 1);			return alignTo<Align>(FixedHeadersSize + OutputFileName.size() + 1);
	}			}

	uint32_t CodeSignatureSection::getBlockCount() const {			uint32_t CodeSignatureSection::getBlockCount() const {
	return (FileOff + BlockSize - 1) / BlockSize;			return (FileOff + BlockSize - 1) / BlockSize;
	}			}

	▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/test/tools/llvm-objcopy/MachO/Inputs/code-signature-check.py

This file was added.

Property	Old Value	New Value
File Mode	null	120000

				../../../../../../lld/test/MachO/Inputs/code-signature-check.py
				No newline at end of file

llvm/test/tools/llvm-objcopy/MachO/code_signature_lc.test

	# RUN: yaml2obj %s -o %t			# RUN: yaml2obj %s -o %t

	## Verify that the input file is valid and contains the expected load command.			## Verify that the input file is valid and contains the expected load command.
	# RUN: llvm-objdump --private-headers %t \| FileCheck %s			# RUN: llvm-objdump --private-headers %t \| FileCheck %s --check-prefix=CHECK-ORIGINAL

	# CHECK: cmd LC_CODE_SIGNATURE			# CHECK-ORIGINAL: cmd LC_CODE_SIGNATURE
	# CHECK-NEXT: cmdsize 16			# CHECK-ORIGINAL-NEXT: cmdsize 16
	# CHECK-NEXT: dataoff 128			# CHECK-ORIGINAL-NEXT: dataoff 16544
	# CHECK-NEXT: datasize 16			# CHECK-ORIGINAL-NEXT: datasize 280

	# RUN: llvm-objcopy %t %t.copy			# RUN: llvm-objcopy %t %t.copy
	# RUN: cmp %t %t.copy			# RUN: obj2yaml %t > %t.yaml
				# RUN: obj2yaml %t.copy > %t.copy.yaml

				## Verify that the copy still includes the load command
				# RUN: cat %t.copy.yaml \| FileCheck %s --check-prefix=CHECK-COPY
				# CHECK-COPY: - cmd: LC_CODE_SIGNATURE
				# CHECK-COPY-NEXT: cmdsize: 16
				# CHECK-COPY-NEXT: dataoff: 16544
				# CHECK-COPY-NEXT: datasize: 304

				## Remove information changed by regeneration of load command:
				## - __LINKEDIT segment filesize may change
				## - LC_CODE_SIGNATURE command dataoff and datasize may change
				## - __LINKEDIT data locations may change

				# RUN: sed -e '/__LINKEDIT/,+4d' \
				# RUN: -e '/LC_CODE_SIGNATURE/,+3d' \
				# RUN: -e '/n_strx/d' \
				# RUN: -e '/dyld_stub_binder/d' %t.yaml > %t.clean.yaml

				# RUN: sed -e '/__LINKEDIT/,+4d' \
				# RUN: -e '/LC_CODE_SIGNATURE/,+3d' \
				# RUN: -e '/n_strx/d' \
				# RUN: -e '/dyld_stub_binder/d' %t.copy.yaml > %t.copy.clean.yaml

				## Verify the remainder of the object file remains unchanged
				# RUN: diff %t.clean.yaml %t.copy.clean.yaml

				## Verify the new signature is valid
				# RUN: %python %p/Inputs/code-signature-check.py %t.copy 16544 304 0 16544

	--- !mach-o			--- !mach-o
	FileHeader:			FileHeader:
	magic: 0xFEEDFACF			magic: 0xFEEDFACF
	cputype: 0x01000007			cputype: 0x1000007
	cpusubtype: 0x80000003			cpusubtype: 0x3
	filetype: 0x00000002			filetype: 0x2
	ncmds: 2			ncmds: 15
	sizeofcmds: 88			sizeofcmds: 760
	flags: 0x00218085			flags: 0x200085
	reserved: 0x00000000			reserved: 0x0
	LoadCommands:			LoadCommands:
	- cmd: LC_SEGMENT_64			- cmd: LC_SEGMENT_64
	cmdsize: 72			cmdsize: 72
				segname: __PAGEZERO
				vmaddr: 0
				vmsize: 4294967296
				fileoff: 0
				filesize: 0
				maxprot: 0
				initprot: 0
				nsects: 0
				flags: 0
				- cmd: LC_SEGMENT_64
				cmdsize: 232
				segname: __TEXT
				vmaddr: 4294967296
				vmsize: 16384
				fileoff: 0
				filesize: 16384
				maxprot: 5
				initprot: 5
				nsects: 2
				flags: 0
				Sections:
				- sectname: __text
				segname: __TEXT
				addr: 0x100003FA0
				size: 15
				offset: 0x3FA0
				align: 4
				reloff: 0x0
				nreloc: 0
				flags: 0x80000400
				reserved1: 0x0
				reserved2: 0x0
				reserved3: 0x0
				content: 554889E531C0C745FC000000005DC3
				- sectname: __unwind_info
				segname: __TEXT
				addr: 0x100003FB0
				size: 72
				offset: 0x3FB0
				align: 2
				reloff: 0x0
				nreloc: 0
				flags: 0x0
				reserved1: 0x0
				reserved2: 0x0
				reserved3: 0x0
				content: 010000001C000000000000001C000000000000001C00000002000000A03F00003400000034000000B03F00000000000034000000030000000C000100100001000000000000000001
				- cmd: LC_SEGMENT_64
				cmdsize: 72
	segname: __LINKEDIT			segname: __LINKEDIT
	vmaddr: 4294979584			vmaddr: 4294983680
	vmsize: 4096			vmsize: 16384
	fileoff: 120			fileoff: 16384
	filesize: 24			filesize: 440
	maxprot: 7			maxprot: 1
	initprot: 1			initprot: 1
	nsects: 0			nsects: 0
	flags: 0			flags: 0
				- cmd: LC_DYLD_INFO_ONLY
				cmdsize: 48
				rebase_off: 0
				rebase_size: 0
				bind_off: 0
				bind_size: 0
				weak_bind_off: 0
				weak_bind_size: 0
				lazy_bind_off: 0
				lazy_bind_size: 0
				export_off: 16384
				export_size: 48
				- cmd: LC_SYMTAB
				cmdsize: 24
				symoff: 16440
				nsyms: 3
				stroff: 16488
				strsize: 48
				- cmd: LC_DYSYMTAB
				cmdsize: 80
				ilocalsym: 0
				nlocalsym: 0
				iextdefsym: 0
				nextdefsym: 2
				iundefsym: 2
				nundefsym: 1
				tocoff: 0
				ntoc: 0
				modtaboff: 0
				nmodtab: 0
				extrefsymoff: 0
				nextrefsyms: 0
				indirectsymoff: 0
				nindirectsyms: 0
				extreloff: 0
				nextrel: 0
				locreloff: 0
				nlocrel: 0
				- cmd: LC_LOAD_DYLINKER
				cmdsize: 32
				name: 12
				Content: '/usr/lib/dyld'
				ZeroPadBytes: 7
				- cmd: LC_UUID
				cmdsize: 24
				uuid: 42759668-1CBA-3094-8E2D-F01E1A66E815
				- cmd: LC_BUILD_VERSION
				cmdsize: 32
				platform: 1
				minos: 720896
				sdk: 721664
				ntools: 1
				Tools:
				- tool: 3
				version: 42600704
				- cmd: LC_SOURCE_VERSION
				cmdsize: 16
				version: 0
				- cmd: LC_MAIN
				cmdsize: 24
				entryoff: 16288
				stacksize: 0
				- cmd: LC_LOAD_DYLIB
				cmdsize: 56
				dylib:
				name: 24
				timestamp: 2
				current_version: 84698117
				compatibility_version: 65536
				Content: '/usr/lib/libSystem.B.dylib'
				ZeroPadBytes: 6
				- cmd: LC_FUNCTION_STARTS
				cmdsize: 16
				dataoff: 16432
				datasize: 8
				- cmd: LC_DATA_IN_CODE
				cmdsize: 16
				dataoff: 16440
				datasize: 0
	- cmd: LC_CODE_SIGNATURE			- cmd: LC_CODE_SIGNATURE
	cmdsize: 16			cmdsize: 16
	dataoff: 128			dataoff: 16544
	datasize: 16			datasize: 280
				LinkEditData:
				ExportTrie:
				TerminalSize: 0
				NodeOffset: 0
				Name: ''
				Flags: 0x0
				Address: 0x0
				Other: 0x0
				ImportName: ''
				Children:
				- TerminalSize: 0
				NodeOffset: 5
				Name: _
				Flags: 0x0
				Address: 0x0
				Other: 0x0
				ImportName: ''
				Children:
				- TerminalSize: 2
				NodeOffset: 33
				Name: _mh_execute_header
				Flags: 0x0
				Address: 0x0
				Other: 0x0
				ImportName: ''
				- TerminalSize: 3
				NodeOffset: 37
				Name: main
				Flags: 0x0
				Address: 0x3FA0
				Other: 0x0
				ImportName: ''
				NameList:
				- n_strx: 2
				n_type: 0xF
				n_sect: 1
				n_desc: 16
				n_value: 4294967296
				- n_strx: 22
				n_type: 0xF
				n_sect: 1
				n_desc: 0
				n_value: 4294983584
				- n_strx: 28
				n_type: 0x1
				n_sect: 0
				n_desc: 256
				n_value: 0
				StringTable:
				- ' '
				- __mh_execute_header
				- _main
				- dyld_stub_binder
				- ''
				- ''
				- ''
	...			...

llvm/tools/llvm-objcopy/MachO/MachOLayoutBuilder.h

	Show All 12 Lines
	#include "Object.h"			#include "Object.h"

	namespace llvm {			namespace llvm {
	namespace objcopy {			namespace objcopy {
	namespace macho {			namespace macho {

	class MachOLayoutBuilder {			class MachOLayoutBuilder {
	Object &O;			Object &O;
				const StringRef OutputFilename;
				jhendersonUnsubmitted Done Reply Inline Actions We don't usually bother marking member variables as `const` in the llvm-objcopy code, so I'd drop the `const` here and in the constructor parameter argument. jhenderson: We don't usually bother marking member variables as `const` in the llvm-objcopy code, so I'd…
	bool Is64Bit;			bool Is64Bit;
	uint64_t PageSize;			uint64_t PageSize;

	// Points to the __LINKEDIT segment if it exists.			// Points to the __LINKEDIT segment if it exists.
	MachO::macho_load_command *LinkEditLoadCommand = nullptr;			MachO::macho_load_command *LinkEditLoadCommand = nullptr;
	StringTableBuilder StrTableBuilder;			StringTableBuilder StrTableBuilder;

	uint32_t computeSizeOfCmds() const;			uint32_t computeSizeOfCmds() const;
	void constructStringTable();			void constructStringTable();
	void updateSymbolIndexes();			void updateSymbolIndexes();
	void updateDySymTab(MachO::macho_load_command &MLC);			void updateDySymTab(MachO::macho_load_command &MLC);
	uint64_t layoutSegments();			uint64_t layoutSegments();
	uint64_t layoutRelocations(uint64_t Offset);			uint64_t layoutRelocations(uint64_t Offset);
	Error layoutTail(uint64_t Offset);			Error layoutTail(uint64_t Offset);

	static StringTableBuilder::Kind getStringTableBuilderKind(const Object &O,			static StringTableBuilder::Kind getStringTableBuilderKind(const Object &O,
	bool Is64Bit);			bool Is64Bit);

	public:			public:
	MachOLayoutBuilder(Object &O, bool Is64Bit, uint64_t PageSize)			MachOLayoutBuilder(Object &O, bool Is64Bit, uint64_t PageSize,
	: O(O), Is64Bit(Is64Bit), PageSize(PageSize),			const StringRef OutputFilename)
				: O(O), OutputFilename(OutputFilename), Is64Bit(Is64Bit),
				PageSize(PageSize),
	StrTableBuilder(getStringTableBuilderKind(O, Is64Bit)) {}			StrTableBuilder(getStringTableBuilderKind(O, Is64Bit)) {}

	// Recomputes and updates fields in the given object such as file offsets.			// Recomputes and updates fields in the given object such as file offsets.
	Error layout();			Error layout();

	StringTableBuilder &getStringTableBuilder() { return StrTableBuilder; }			StringTableBuilder &getStringTableBuilder() { return StrTableBuilder; }
	};			};

	} // end namespace macho			} // end namespace macho
	} // end namespace objcopy			} // end namespace objcopy
	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_OBJCOPY_MACHO_MACHOLAYOUTBUILDER_H			#endif // LLVM_OBJCOPY_MACHO_MACHOLAYOUTBUILDER_H

llvm/tools/llvm-objcopy/MachO/MachOLayoutBuilder.cpp

//===- MachOLayoutBuilder.cpp ------------------------------------ C++ --===//		//===- MachOLayoutBuilder.cpp ------------------------------------ C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "MachOLayoutBuilder.h"		#include "MachOLayoutBuilder.h"
		#include "../CommonConfig.h"
		#include "llvm/Object/MachO.h"
#include "llvm/Support/Alignment.h"		#include "llvm/Support/Alignment.h"
#include "llvm/Support/Errc.h"		#include "llvm/Support/Errc.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"

using namespace llvm;		using namespace llvm;
using namespace llvm::objcopy::macho;		using namespace llvm::objcopy::macho;

StringTableBuilder::Kind		StringTableBuilder::Kind
▲ Show 20 Lines • Show All 239 Lines • ▼ Show 20 Lines	uint64_t StartOfSymbols =
StartOfLinkerOptimizationHint + O.LinkerOptimizationHint.Data.size();		StartOfLinkerOptimizationHint + O.LinkerOptimizationHint.Data.size();
uint64_t StartOfIndirectSymbols =		uint64_t StartOfIndirectSymbols =
StartOfSymbols + NListSize * O.SymTable.Symbols.size();		StartOfSymbols + NListSize * O.SymTable.Symbols.size();
uint64_t StartOfSymbolStrings =		uint64_t StartOfSymbolStrings =
StartOfIndirectSymbols +		StartOfIndirectSymbols +
sizeof(uint32_t) * O.IndirectSymTable.Symbols.size();		sizeof(uint32_t) * O.IndirectSymTable.Symbols.size();
uint64_t StartOfCodeSignature =		uint64_t StartOfCodeSignature =
StartOfSymbolStrings + StrTableBuilder.getSize();		StartOfSymbolStrings + StrTableBuilder.getSize();
if (O.CodeSignatureCommandIndex)		uint32_t CodeSignatureSectionSize = 0;
		if (O.CodeSignatureCommandIndex) {
StartOfCodeSignature = alignTo(StartOfCodeSignature, 16);		StartOfCodeSignature = alignTo(StartOfCodeSignature, 16);
		uint32_t TextSegmentFileOff = 0;
		uint32_t TextSegmentFileSize = 0;
		if (O.TextSegmentCommandIndex) {
		const LoadCommand &TextSegmentLoadCommand =
		O.LoadCommands[*O.TextSegmentCommandIndex];
		TextSegmentFileOff = *TextSegmentLoadCommand.getSegmentFileOffset();
		TextSegmentFileSize = *TextSegmentLoadCommand.getSegmentFileSize();
		}
		object::CodeSignatureSection SectionBuilder = object::CodeSignatureSection(
		StartOfCodeSignature, OutputFilename,
		static_cast<MachO::HeaderFileType>(O.Header.FileType),
		TextSegmentFileOff, TextSegmentFileSize);
		CodeSignatureSectionSize = SectionBuilder.getSize();
		}
uint64_t LinkEditSize =		uint64_t LinkEditSize =
(StartOfCodeSignature + O.CodeSignature.Data.size()) - StartOfLinkEdit;		(StartOfCodeSignature + CodeSignatureSectionSize) - StartOfLinkEdit;

// Now we have determined the layout of the contents of the __LINKEDIT		// Now we have determined the layout of the contents of the __LINKEDIT
// segment. Update its load command.		// segment. Update its load command.
if (LinkEditLoadCommand) {		if (LinkEditLoadCommand) {
MachO::macho_load_command *MLC = LinkEditLoadCommand;		MachO::macho_load_command *MLC = LinkEditLoadCommand;
switch (LinkEditLoadCommand->load_command_data.cmd) {		switch (LinkEditLoadCommand->load_command_data.cmd) {
case MachO::LC_SEGMENT:		case MachO::LC_SEGMENT:
MLC->segment_command_data.cmdsize = sizeof(MachO::segment_command);		MLC->segment_command_data.cmdsize = sizeof(MachO::segment_command);
Show All 11 Lines	Error MachOLayoutBuilder::layoutTail(uint64_t Offset) {
}		}

for (LoadCommand &LC : O.LoadCommands) {		for (LoadCommand &LC : O.LoadCommands) {
auto &MLC = LC.MachOLoadCommand;		auto &MLC = LC.MachOLoadCommand;
auto cmd = MLC.load_command_data.cmd;		auto cmd = MLC.load_command_data.cmd;
switch (cmd) {		switch (cmd) {
case MachO::LC_CODE_SIGNATURE:		case MachO::LC_CODE_SIGNATURE:
MLC.linkedit_data_command_data.dataoff = StartOfCodeSignature;		MLC.linkedit_data_command_data.dataoff = StartOfCodeSignature;
MLC.linkedit_data_command_data.datasize = O.CodeSignature.Data.size();		MLC.linkedit_data_command_data.datasize = CodeSignatureSectionSize;
break;		break;
case MachO::LC_SYMTAB:		case MachO::LC_SYMTAB:
MLC.symtab_command_data.symoff = StartOfSymbols;		MLC.symtab_command_data.symoff = StartOfSymbols;
MLC.symtab_command_data.nsyms = O.SymTable.Symbols.size();		MLC.symtab_command_data.nsyms = O.SymTable.Symbols.size();
MLC.symtab_command_data.stroff = StartOfSymbolStrings;		MLC.symtab_command_data.stroff = StartOfSymbolStrings;
MLC.symtab_command_data.strsize = StrTableBuilder.getSize();		MLC.symtab_command_data.strsize = StrTableBuilder.getSize();
break;		break;
case MachO::LC_DYSYMTAB: {		case MachO::LC_DYSYMTAB: {
▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp

Show First 20 Lines • Show All 398 Lines • ▼ Show 20 Lines	Error objcopy::macho::executeObjcopyOnBinary(const CommonConfig &Config,
case Triple::ArchType::aarch64:		case Triple::ArchType::aarch64:
case Triple::ArchType::aarch64_32:		case Triple::ArchType::aarch64_32:
PageSize = 16384;		PageSize = 16384;
break;		break;
default:		default:
PageSize = 4096;		PageSize = 4096;
}		}

MachOWriter Writer(**O, In.is64Bit(), In.isLittleEndian(), PageSize, Out);		MachOWriter Writer(**O, In.is64Bit(), In.isLittleEndian(), PageSize, Out,
		Config.OutputFilename);
if (auto E = Writer.finalize())		if (auto E = Writer.finalize())
return E;		return E;
return Writer.write();		return Writer.write();
}		}

Error objcopy::macho::executeObjcopyOnMachOUniversalBinary(		Error objcopy::macho::executeObjcopyOnMachOUniversalBinary(
const MultiFormatConfig &Config, const MachOUniversalBinary &In,		const MultiFormatConfig &Config, const MachOUniversalBinary &In,
raw_ostream &Out) {		raw_ostream &Out) {
▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

llvm/tools/llvm-objcopy/MachO/MachOReader.cpp

Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	assert(S.NReloc == S.Relocations.size() &&
"Incorrect number of relocations");		"Incorrect number of relocations");
}		}
return std::move(Sections);		return std::move(Sections);
}		}

Error MachOReader::readLoadCommands(Object &O) const {		Error MachOReader::readLoadCommands(Object &O) const {
// For MachO sections indices start from 1.		// For MachO sections indices start from 1.
uint32_t NextSectionIndex = 1;		uint32_t NextSectionIndex = 1;
		static const char TextSegmentName[] = "__TEXT";
for (auto LoadCmd : MachOObj.load_commands()) {		for (auto LoadCmd : MachOObj.load_commands()) {
LoadCommand LC;		LoadCommand LC;
switch (LoadCmd.C.cmd) {		switch (LoadCmd.C.cmd) {
case MachO::LC_CODE_SIGNATURE:		case MachO::LC_CODE_SIGNATURE:
O.CodeSignatureCommandIndex = O.LoadCommands.size();		O.CodeSignatureCommandIndex = O.LoadCommands.size();
break;		break;
case MachO::LC_SEGMENT:		case MachO::LC_SEGMENT:
		if (StringRef(
		reinterpret_cast<MachO::segment_command const *>(LoadCmd.Ptr)
		->segname) == TextSegmentName)
		O.TextSegmentCommandIndex = O.LoadCommands.size();

if (Expected<std::vector<std::unique_ptr<Section>>> Sections =		if (Expected<std::vector<std::unique_ptr<Section>>> Sections =
extractSections<MachO::section, MachO::segment_command>(		extractSections<MachO::section, MachO::segment_command>(
LoadCmd, MachOObj, NextSectionIndex))		LoadCmd, MachOObj, NextSectionIndex))
LC.Sections = std::move(*Sections);		LC.Sections = std::move(*Sections);
else		else
return Sections.takeError();		return Sections.takeError();
break;		break;
case MachO::LC_SEGMENT_64:		case MachO::LC_SEGMENT_64:
		if (StringRef(
		reinterpret_cast<MachO::segment_command_64 const *>(LoadCmd.Ptr)
		->segname) == TextSegmentName)
		O.TextSegmentCommandIndex = O.LoadCommands.size();

if (Expected<std::vector<std::unique_ptr<Section>>> Sections =		if (Expected<std::vector<std::unique_ptr<Section>>> Sections =
extractSections<MachO::section_64, MachO::segment_command_64>(		extractSections<MachO::section_64, MachO::segment_command_64>(
LoadCmd, MachOObj, NextSectionIndex))		LoadCmd, MachOObj, NextSectionIndex))
LC.Sections = std::move(*Sections);		LC.Sections = std::move(*Sections);
else		else
return Sections.takeError();		return Sections.takeError();
break;		break;
case MachO::LC_SYMTAB:		case MachO::LC_SYMTAB:
▲ Show 20 Lines • Show All 205 Lines • Show Last 20 Lines

llvm/tools/llvm-objcopy/MachO/MachOWriter.h

Show All 14 Lines
namespace llvm {		namespace llvm {
class Error;		class Error;

namespace objcopy {		namespace objcopy {
namespace macho {		namespace macho {

class MachOWriter {		class MachOWriter {
Object &O;		Object &O;
		const StringRef OutputFilename;
bool Is64Bit;		bool Is64Bit;
bool IsLittleEndian;		bool IsLittleEndian;
uint64_t PageSize;		uint64_t PageSize;
std::unique_ptr<WritableMemoryBuffer> Buf;		std::unique_ptr<WritableMemoryBuffer> Buf;
raw_ostream &Out;		raw_ostream &Out;
MachOLayoutBuilder LayoutBuilder;		MachOLayoutBuilder LayoutBuilder;

size_t headerSize() const;		size_t headerSize() const;
Show All 18 Lines	class MachOWriter {
void writeCodeSignatureData();		void writeCodeSignatureData();
void writeDataInCodeData();		void writeDataInCodeData();
void writeLinkerOptimizationHint();		void writeLinkerOptimizationHint();
void writeFunctionStartsData();		void writeFunctionStartsData();
void writeTail();		void writeTail();

public:		public:
MachOWriter(Object &O, bool Is64Bit, bool IsLittleEndian, uint64_t PageSize,		MachOWriter(Object &O, bool Is64Bit, bool IsLittleEndian, uint64_t PageSize,
raw_ostream &Out)		raw_ostream &Out, const StringRef OutputFilename)
: O(O), Is64Bit(Is64Bit), IsLittleEndian(IsLittleEndian),		: O(O), OutputFilename(OutputFilename), Is64Bit(Is64Bit),
PageSize(PageSize), Out(Out), LayoutBuilder(O, Is64Bit, PageSize) {}		IsLittleEndian(IsLittleEndian), PageSize(PageSize), Out(Out),
		LayoutBuilder(O, Is64Bit, PageSize, OutputFilename) {}

size_t totalSize() const;		size_t totalSize() const;
Error finalize();		Error finalize();
Error write();		Error write();
};		};

} // end namespace macho		} // end namespace macho
} // end namespace objcopy		} // end namespace objcopy
} // end namespace llvm		} // end namespace llvm

llvm/tools/llvm-objcopy/MachO/MachOWriter.cpp

//===- MachOWriter.cpp ------------------------------------------*- C++ -*-===// //===- MachOWriter.cpp ------------------------------------------*- C++ -*-===//

// //

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information. // See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#include "MachOWriter.h" #include "MachOWriter.h"

#include "../CommonConfig.h"

jhendersonUnsubmitted

Done

#include "MachOWriter.h"

- #include "../CommonConfig.h"

+ #include "CommonConfig.h"

#include "MachOLayoutBuilder.h"

You should be able to include this as in the inline edit (cf ELF/ELFObjcopy.cpp which doesn't have the relative include). Same goes elsewhere.

jhenderson: You should be able to include this as in the inline edit (cf ELF/ELFObjcopy.cpp which doesn't…

#include "MachOLayoutBuilder.h" #include "MachOLayoutBuilder.h"

#include "Object.h" #include "Object.h"

#include "llvm/ADT/STLExtras.h" #include "llvm/ADT/STLExtras.h"

#include "llvm/BinaryFormat/MachO.h" #include "llvm/BinaryFormat/MachO.h"

#include "llvm/Object/MachO.h" #include "llvm/Object/MachO.h"

#include "llvm/Support/Errc.h" #include "llvm/Support/Errc.h"

#include "llvm/Support/ErrorHandling.h" #include "llvm/Support/ErrorHandling.h"

#include <memory> #include <memory>

▲ Show 20 Lines • Show All 401 Lines • ▼ Show 20 Lines const MachO::linkedit_data_command &LinkEditDataCommand =

O.LoadCommands[*LCIndex].MachOLoadCommand.linkedit_data_command_data; O.LoadCommands[*LCIndex].MachOLoadCommand.linkedit_data_command_data;

char *Out = (char *)Buf->getBufferStart() + LinkEditDataCommand.dataoff; char *Out = (char *)Buf->getBufferStart() + LinkEditDataCommand.dataoff;

assert((LinkEditDataCommand.datasize == LD.Data.size()) && assert((LinkEditDataCommand.datasize == LD.Data.size()) &&

"Incorrect data size"); "Incorrect data size");

memcpy(Out, LD.Data.data(), LD.Data.size()); memcpy(Out, LD.Data.data(), LD.Data.size());

} }

void MachOWriter::writeCodeSignatureData() { void MachOWriter::writeCodeSignatureData() {

return writeLinkData(O.CodeSignatureCommandIndex, O.CodeSignature); if (O.CodeSignatureCommandIndex) {

MachO::linkedit_data_command &CodeSignatureLoadCommand =

O.LoadCommands[*O.CodeSignatureCommandIndex]

.MachOLoadCommand.linkedit_data_command_data;

uint32_t TextSegmentFileOff = 0;

uint32_t TextSegmentFileSize = 0;

if (O.TextSegmentCommandIndex) {

const LoadCommand &TextSegmentLoadCommand =

O.LoadCommands[*O.TextSegmentCommandIndex];

TextSegmentFileOff = *TextSegmentLoadCommand.getSegmentFileOffset();

TextSegmentFileSize = *TextSegmentLoadCommand.getSegmentFileSize();

}

object::CodeSignatureSection SignatureBuilder =

alexander-shaposhnikovUnsubmitted

Not Done

It looks like I've missed the previous diff https://reviews.llvm.org/D109803, sorry about being late.

1/ MachO.h is probably not the ideal place for this functionality, according to the top file comment "MachO.h declares the MachOObjectFile class". My first impression is that the public declaration probably should be placed into a separate file, e.g. "MachOCodeSignature.h" (and we need a detailed comment explaining what this class or function does)

2/ The responsibility of the class CodeSignatureSection seems to be somewhat unclear. Perhaps, we don't really need a class here at all. What would you say to just creating a function, e.g. std::string buildCodeSignatureStab(...) or void writeCodeSignatureStab(...) with some clear separation of the input / output and removing the unrelated pieces of functionality (e.g. msync on the buffer, this is probably not suitable for a library function) ? (and not expose the implementation details in the header file).

3/ Some minor comments, e.g. stripOutputFilePath - I suspect Suport/Path.h might already have some helper utilities

What I would recommend here - can we (temporarily) take a step back and reopen & revert D109803, if it were not a library interface I would not be so worried, but for some common and frequently used libraries like libObject I would highly suggest we should clean up the interface first and I'm more than happy to help.

alexander-shaposhnikov: It looks like I've missed the previous diff https://reviews.llvm.org/D109803, sorry about being…

drodriguezAuthorUnsubmitted

Not Done

Hi Alex,

We would prefer to fix forward instead of reverting, if possible. As it sits, D109803 haven't changed anything and it is just "dormant" code as far as LLVM respects. Only this diff uses that functionality.

I will review your comment next week more carefully and provide better answers.

1/ I think we can do this. It should not be a problem.
2/ The class tries to imitate the existing code in LLD, which was a class. There's two related functionalities to the class: measuring the size, and serializing. For both one needs the same inputs, so it was interesting to tie data and behaviour together.
2b/ msync on the buffer is actually important, and llvm-objcopy might not work unless we make that call. IIRC without that, if one was writing the same file, and that file was mmap'ed into memory, the kernel will not realize that the signature had changed, and will not evaluate again. I can look for a reference that explained the problem in a little bit more detail.
3/ We realize that it was not portable for Windows, and Nuri changed that this afternoon to use llvm::path instead. This diff has the fix, since the previous one was already landed.

Thanks for the help offering. Any feedback is welcome. We will try to get to the best version of this that satisfies everyone's requirements.

drodriguez: Hi Alex, We would prefer to fix forward instead of reverting, if possible. As it sits, D109803…

alexander-shaposhnikovUnsubmitted

Not Done

Hi! Many thanks!
My 0.02$ . The current design looks a little bit suboptimal, especially for a library, so I suggest let's improve the interface first and put things into the right place. It's a bit hard to post comments outside of the code context, but yeah, I'm late to the party. The LLD-specific details ("CodeSignatureSection") can still live in the LLD codebase and make use of the function from libObject.
If there are no significant performance differences I would be biased towards readability / simplicity, e.g. one can consider something like this (I didn't verify the details, so maybe some parameters are unnecessary or, instead, missing)

SmallVector<char, 64> buildCodeSignatureStab(
    uint64_t SignatureOffset,
    MachO::HeaderFileType FileType, StringRef FilePath,
    uint64_t TextSegmentOffset, StringRef TextSegmentContent)

and in LLD the method CodeSignatureSection::finalize would call this function - would that be sufficient ? (maybe I'm missing something, feel free to correct me). (Plus in this case we will avoid having two different CodeSignatureSection classes)
and (mostly motivated by the "separation of concerns") I'd leave the unfortunate workaround with msync on the application's side, hopefully it'll be gone in the future.

alexander-shaposhnikov: Hi! Many thanks! My 0.02$ . The current design looks a little bit suboptimal, especially for a…

drodriguezAuthorUnsubmitted

Not Done

About the ideal place for this code:

We thought about libObject and specifically MachO.h because it seemed like the best place, but we are fine relocating the code if necessary. We have also seen libBinaryFormat which includes some support for MachO and which might be a good place for these. The basic structs for the code signature (CS_CodeDirectory) is there, so it might be a good idea to keep everything together. Our only requirement is that it should be accessible to both llvm-objcopy and LLD to avoid code duplication.

These are not LLD specific details, as far as we understand, but part of how MachO binaries are put together and every tool that creates a MachO binary will need to play by these rules (they apply to each slice independently, so in our testing Universal MachO does not need changes, but there might be edge cases we are not aware).

About the shape of the code:

We don’t mind removing the class and having a couple of free functions instead, but there is the possibility of the responsibilities of the class growing bigger in the future, and we would like to avoid functions with a lot of parameters (there’s already 4 that both the serialization and the calculation of the size would need to receive).

About your proposal of buildCodeSignatureStab, is there an example of that pattern already in the code. We have looked for similar constructs and we cannot find it. Also, being “Stab” a term of art related to dSYM, maybe it is not the best name to use (alternatives can be something like “Blob”, or use “serialize” instead of “build”). Also, using SmallVector might be limited for this usage since the actual signature depends on the size of the binary, and for each 4 KB of binary we need an extra 32 bytes. I have seen signatures being 15 KB for LLVM binaries. We would prefer the versions that write directly into memory to avoid excessive copy of data.

Finally, about msync:

While we can move the msync calls into the application code, and add a note in the documentation of the method that the caller must ensure that msync gets invoked, we think it is better to keep the msync only in one place with a clear explanation of why it is necessary.

The details are in https://openradar.appspot.com/FB8914231 but in short, when mmaping an executable, the kernel seems to cache the results of the signature verification and might not check for the validity of the binary again, even if the signature gets regenerated. This is very typical while iterating on a binary, and creating the binary over and over, so it is something that can happen in both LLD and llvm-objcopy (specially in llvm-strip).

Apple might fix their side eventually, but until then we need the workaround everywhere, so we think a central place is better.

drodriguez: About the ideal place for this code: We thought about libObject and specifically MachO.h…

alexander-shaposhnikovUnsubmitted

Not Done

is there an example of that pattern already in the code. We have looked for similar constructs and we cannot find it.

If I'm not mistaken libObject was initially designed to present a read-only view of object files and closely models the binary formats, so, in general, what was added in https://reviews.llvm.org/D109803 does not have many analogs or counterparts there (to the best of my knowledge), although libObject contains some functionality to write archives and universal binaries.

I find the interface introduced in https://reviews.llvm.org/D109803 quite confusing and still think it would be better to revisit that decision (and revert for now) before it has propagated further. This can be subjective so I'm happy if somebody else familiar with libObject weighs in, maybe I'm missing something.

If I am not mistaken this code doesn't use the signer's identity in any way so it's more like a collection of hashes.
That's why above I called it a "stab". I think it also indicates that we need a detailed comment (in the source code) that should clarify these aspects.

These are not LLD specific details, as far as we understand, but part of how MachO binaries

LLD uses the term "section" to model various parts of the LINKEDIT segment, however, the binary format does not use this term (to the best of my knowledge) for this purpose. The class object::CodeSignatureSection looks very similar to what was initially implemented in LLD but I'm not sure it's a good fit for libObject, this is what I was referring to as "LLD-specific details" above.

Subscribing more people who are familiar with libObject & Mach-O:
cc @mtrent , @lattner, @steven_wu

alexander-shaposhnikov: 1. >is there an example of that pattern already in the code. We have looked for similar…

drodriguezAuthorUnsubmitted

Not Done

Seems fair that libObject is mainly for reading files. In libBinaryFormat is were the some of the struct used by LC_CODE_SIGNATURE are defined. There are small functions in there for MachO (mostly getters, but some setters). Would that be a good place? It also has a MsgPackWriter, which seems mostly unrelated, but seems to show that "writers" are OK in that library.

About being confusing, we are fine changing the couple of methods into free functions if that allow us to advance further. We chose a class because it was already that shape in LLD, and it made sense to keep that design. Would that be OK for everyone? It would be one function for measuring size, and one function to write the headers of the data contained in LC_CODE_SIGNATURE and calculate the appropriate hashes.

About the name "code signing": you are completely right that this does not use the signer identity at all and it is just a bunch of hashes. I think it should be understood more as a binary "signature", than a cryptographic signature of the code. It is mostly an HMAC, but without the cryptographic pieces. From what I understand it is just a way for the linker to output "verified" binaries in Apple Silicon Macs without having to setup signing identities and stuff like that.

We will add more comments to the code to try to explain better the purpose and which functionality is implemented.

About the term "section": Yes, LLD implements the linkedit segment with synthetic sections, and that name did slip, and I did not realize what you were referring to. These are not what MachO understands as sections. We will change that and probably make it clear that it is related to the CS_CodeDirectory, CS_SuperBlob and CS_BlobIndex.

In summary, if everyone agrees:

Move the functionality outside libObject and probably into libBinaryFormat, where the struct CS_CodeDirectory is already living.
Transform the functionality into a couple of free functions that receive all the necessary information.
Do not mention "Section" as related to any of this functionality. Also try to not mention "code signature" that much, if a better name is available.
Also the feedback provided by James Henderson above.

We hope the plan sounds good to everyone. Let us know if someone has more feedback that we can integrate.

drodriguez: Seems fair that `libObject` is mainly for reading files. In `libBinaryFormat` is were the some…

steven_wuUnsubmitted

Not Done

I don't think libBinaryFormat is a good place to implement write.

I think a good design should be following:

libObject (CodeSignatureSection) knows how to read the section and generations section with data
CodeSignatureSection can return generated section data in a buffer if it has to.
Binary write should be implemented inside tools (like llvm-objcopy, lld or yaml2obj). That is not really a code duplication.

steven_wu: I don't think `libBinaryFormat` is a good place to implement write. I think a good design…

drodriguezAuthorUnsubmitted

Not Done

@steven_wu: Thanks for the input. Let me try to see if we understand each other correctly.

Neither LLD, nor llvm-objcopy are not really interested in "reading" the contents of the data from the LC_CODE_SIGNATURE. I would not like to design an API that has no users. If we needed to read from the section, and after Alexander explanations above, we now know that the right place is libObject, and we will add the code there, if necessary.
To generate the data one needs access to the fully layout binary (slice). The process involves hashing every page of the binary (4 KB) and appending a hash of the contents. The data "header" can be written without access to the binary contents, but one still needs some information (mainly about offsets).
The binary write that we have tried to abstract from the tools still require the tools to provide a buffer where the data is written (and a buffer where the binary contents are read, to generate the hashes). It was a happy coincidence that both tools worked with buffers, but if at some point in the future the writting mechanism have to be abstracted away, it should still be possible. I would not like to have the same process in each of the tools, if possible, since there is tricky pieces that would be bad to have in two parts of the code base (this was the main reason of bringing the LLD implementation into LLVM: we didn't want to simply duplicate the code).
We are fine moving the msync out of this and into the tools, if that's necessary to make everyone happy. We do not think it is the right decision, but we are fine making that change.

drodriguez: @steven_wu: Thanks for the input. Let me try to see if we understand each other correctly.

drodriguezAuthorUnsubmitted

Not Done

@steven_wu: Thanks for you feedback. You can read the next steps we plan to take in the other comment. I think many of the points will not make sense in the new approach, but if you think something is important, or you want to provide commentary to my points above, we are happy to hear your feedback.

drodriguez: @steven_wu: Thanks for you feedback. You can read the next steps we plan to take in the other…

object::CodeSignatureSection(

CodeSignatureLoadCommand.dataoff, OutputFilename,

static_cast<MachO::HeaderFileType>(O.Header.FileType),

TextSegmentFileOff, TextSegmentFileSize);

SignatureBuilder.write(reinterpret_cast<uint8_t *>(Buf->getBufferStart()));

}

} }

void MachOWriter::writeDataInCodeData() { void MachOWriter::writeDataInCodeData() {

return writeLinkData(O.DataInCodeCommandIndex, O.DataInCode); return writeLinkData(O.DataInCodeCommandIndex, O.DataInCode);

} }

void MachOWriter::writeLinkerOptimizationHint() { void MachOWriter::writeLinkerOptimizationHint() {

return writeLinkData(O.LinkerOptimizationHintCommandIndex, return writeLinkData(O.LinkerOptimizationHintCommandIndex,

▲ Show 20 Lines • Show All 120 Lines • Show Last 20 Lines

llvm/tools/llvm-objcopy/MachO/Object.h

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines

struct LoadCommand {

std::vector<uint8_t> Payload;

// Some load commands can contain (inside the payload) an array of sections,

// though the contents of the sections are stored separately. The struct

// Section describes only sections' metadata and where to find the

// corresponding content inside the binary.

std::vector<std::unique_ptr<Section>> Sections;

// Returns the segment fileoff if the load command is a segment command

jhendersonUnsubmitted

Done

std::vector<std::unique_ptr<Section>> Sections;

- // Returns the segment fileoff if the load command is a segment command

+ // Returns the segment fileoff if the load command is a segment command.

Optional<uint64_t> getSegmentFileOffset() const;

Nit: comments must end in a full stop. Same in the two other cases below.

jhenderson: Nit: comments must end in a full stop. Same in the two other cases below.

Optional<uint64_t> getSegmentFileOffset() const;

// Returns the segment filesize if the load command is a segment command

Optional<uint64_t> getSegmentFileSize() const;

// Returns the segment name if the load command is a segment command.

Optional<StringRef> getSegmentName() const;

// Returns the segment vm address if the load command is a segment command.

Optional<uint64_t> getSegmentVMAddr() const;

};

// A symbol information. Fields which starts with "n_" are same as them in the

▲ Show 20 Lines • Show All 219 Lines • ▼ Show 20 Lines

struct Object {

/// The index LC_DYSYMTAB load comamnd if present.

Optional<size_t> DySymTabCommandIndex;

/// The index LC_DATA_IN_CODE load comamnd if present.

Optional<size_t> DataInCodeCommandIndex;

/// The index of LC_LINKER_OPTIMIZATIN_HINT load comamnd if present.

Optional<size_t> LinkerOptimizationHintCommandIndex;

/// The index LC_FUNCTION_STARTS load comamnd if present.

Optional<size_t> FunctionStartsCommandIndex;

/// The index of the LC_SEGMENT or LC_SEGMENT_64 load command

/// corresponding to the __TEXT segment

Optional<size_t> TextSegmentCommandIndex;

BumpPtrAllocator Alloc;

StringSaver NewSectionsContents;

Object() : NewSectionsContents(Alloc) {}

Error

removeSections(function_ref<bool(const std::unique_ptr<Section> &)> ToRemove);

Show All 23 Lines

llvm/tools/llvm-objcopy/MachO/Object.cpp

	Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines
	}			}

	/// Extracts a segment name from a string which is possibly non-null-terminated.			/// Extracts a segment name from a string which is possibly non-null-terminated.
	static StringRef extractSegmentName(const char *SegName) {			static StringRef extractSegmentName(const char *SegName) {
	return StringRef(SegName,			return StringRef(SegName,
	strnlen(SegName, sizeof(MachO::segment_command::segname)));			strnlen(SegName, sizeof(MachO::segment_command::segname)));
	}			}

				Optional<uint64_t> LoadCommand::getSegmentFileOffset() const {
				const MachO::macho_load_command &MLC = MachOLoadCommand;
				switch (MLC.load_command_data.cmd) {
				case MachO::LC_SEGMENT:
				return MLC.segment_command_data.fileoff;
				case MachO::LC_SEGMENT_64:
				return MLC.segment_command_64_data.fileoff;
				default:
				return None;
				}
				}

				Optional<uint64_t> LoadCommand::getSegmentFileSize() const {
				const MachO::macho_load_command &MLC = MachOLoadCommand;
				switch (MLC.load_command_data.cmd) {
				case MachO::LC_SEGMENT:
				return MLC.segment_command_data.filesize;
				case MachO::LC_SEGMENT_64:
				return MLC.segment_command_64_data.filesize;
				default:
				return None;
				}
				}

	Optional<StringRef> LoadCommand::getSegmentName() const {			Optional<StringRef> LoadCommand::getSegmentName() const {
	const MachO::macho_load_command &MLC = MachOLoadCommand;			const MachO::macho_load_command &MLC = MachOLoadCommand;
	switch (MLC.load_command_data.cmd) {			switch (MLC.load_command_data.cmd) {
	case MachO::LC_SEGMENT:			case MachO::LC_SEGMENT:
	return extractSegmentName(MLC.segment_command_data.segname);			return extractSegmentName(MLC.segment_command_data.segname);
	case MachO::LC_SEGMENT_64:			case MachO::LC_SEGMENT_64:
	return extractSegmentName(MLC.segment_command_64_data.segname);			return extractSegmentName(MLC.segment_command_64_data.segname);
	default:			default:
	Show All 15 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Regenerate LC_CODE_SIGNATURE during llvm-objcopy operationsAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 373338

llvm/include/llvm/Object/MachO.h

llvm/lib/Object/CodeSignatureSection.cpp

llvm/test/tools/llvm-objcopy/MachO/Inputs/code-signature-check.py

llvm/test/tools/llvm-objcopy/MachO/code_signature_lc.test

llvm/tools/llvm-objcopy/MachO/MachOLayoutBuilder.h

llvm/tools/llvm-objcopy/MachO/MachOLayoutBuilder.cpp

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp

llvm/tools/llvm-objcopy/MachO/MachOReader.cpp

llvm/tools/llvm-objcopy/MachO/MachOWriter.h

llvm/tools/llvm-objcopy/MachO/MachOWriter.cpp

llvm/tools/llvm-objcopy/MachO/Object.h

llvm/tools/llvm-objcopy/MachO/Object.cpp

Regenerate LC_CODE_SIGNATURE during llvm-objcopy operations
AbandonedPublic