This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
BinaryFormat/
1/3
MachO.h
-
Object/
-
MachO.h
-
lib/Object/
-
Object/
2/10
MachOObjectFile.cpp
-
test/tools/llvm-objdump/MachO/
-
tools/
-
llvm-objdump/
-
MachO/
-
chained-fixups.test
-
tools/llvm-objdump/
-
llvm-objdump/
1
MachODump.cpp

Differential D131982

[llvm-objdump] Complete -chained_fixups support
ClosedPublic

Authored by BertalanD on Aug 16 2022, 11:07 AM.

Download Raw Diff

Details

Reviewers

thakis
MaskRay
jhenderson

Commits

rG686d8ce1ab16: [llvm-objdump] Complete -chained_fixups support

Summary

This commit adds definitions for the dyld_chained_import* structs.
The imports array is now printed with llvm-otool -chained_fixups. This
completes this option's implementation.

A slight difference from cctools otool is that we don't yet dump the
raw bytes of the imports entries.

When Apple's effort to upstream their chained fixups code continues,
we'll replace this code with the then-upstreamed code. But we need
something in the meantime for testing ld64.lld's chained fixups code.

Diff Detail

Event Timeline

BertalanD created this revision.Aug 16 2022, 11:07 AM

Herald added a reviewer: jhenderson. · View Herald TranscriptAug 16 2022, 11:07 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: StephenFan, rupprecht, hiraditya. · View Herald Transcript

BertalanD requested review of this revision.Aug 16 2022, 11:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 16 2022, 11:07 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

BertalanD added a parent revision: D131961: [llvm-objdump] Support dumping segment information in -chained_fixups.Aug 16 2022, 11:07 AM

Harbormaster completed remote builds in B181580: Diff 453075.Aug 16 2022, 11:07 AM

BertalanD added a child revision: D132036: [llvm-objdump] Add -dyld_info to llvm-otool.Aug 17 2022, 8:54 AM

Very nice!

llvm/include/llvm/BinaryFormat/MachO.h
2131	It'd be nice to use bit_cast from llvm/ADT/bit.h here, which will ensure that Raw is the same size as dyld_chained_import. Here this is fairly easy I think: auto Raw = bit_cast<uint32_t>(C); sys::swapByteOrder(Raw); C = bit_cast<dyld_chained_import>(Raw); Below it's a bit more tricky, but I think this might work (all untested though): auto Raw = bit_cast<std::array<uint32_t, 2>>(C); sys::swapByteOrder(Raw[0]); sys::swapByteOrder(Raw[1]); C = bit_cast<(dyld_chained_import_addend>(Raw); (and analogously for the 64-bit version)
llvm/lib/Object/MachOObjectFile.cpp
4935	nit: no else after return This function looks weird! It almost does `if (V == a) return a` a bunch of times followed by `return V`, and the special cases look pretty pointless at first. (I understand they're needed to sign-extend the 3 special values to int, while the input is 8 or 16 bits.) It feels like there should be a nicer way to write this. Maybe if (Value == static_cast<T>(MachO::BIND_SPECIAL_DYLIB_MAIN_EXECUTABLE) \|\| Value == static_cast<T>(MachO::BIND_SPECIAL_DYLIB_FLAT_LOOKUP) \|\| third case) return SignExtend32<sizeof(T) * 8>(Value); return Value; Not sure if that's actually better, but it makes the difference a bit more explicit maybe. Up to you :)
4969	Isn't `sizeof(dyld_chained_import)` clearer here? (likewise in the other two cases)
5005	This has GetEncodedOrdinal with upper-case G while the function above has it with lower-case g. Can this build? Maybe you uploaded a half-committed patch?
5028	nit: should this return a malformedError instead? it's data-dependent, so it's not truly unreachable. (but up to you)
llvm/tools/llvm-objdump/MachODump.cpp
1295	It makes sense to me to wait with this until the upstreaming has happened to see how to best implement this then.

This revision is now accepted and ready to land.Aug 17 2022, 3:38 PM

(All comments optional and up to you, in case that wasn't clear.)

BertalanD added inline comments.Aug 18 2022, 1:34 AM

llvm/include/llvm/BinaryFormat/MachO.h
2131	I'm not convinced that swapping the bytes is enough for endianness conversion. https://www.naic.edu/~phil/notes/bitfieldStorage.html suggests that the bits within the bytes would also need to be reversed. I'm going to update this diff to use bit masks and shifts like D132036.
llvm/lib/Object/MachOObjectFile.cpp
4935	I didn't know about `SignExtend32`, thank you for mentioning it. Your version does indeed look nicer than what I wrote.
5028	I'll add an error message to where we're computing `ImportSize`, and then this branch will be truly unreachable.

BertalanD updated this revision to Diff 453605.Aug 18 2022, 3:14 AM

BertalanD updated this revision to Diff 453607.Aug 18 2022, 3:16 AM

Harbormaster completed remote builds in B181957: Diff 453607.Aug 18 2022, 3:57 AM

thakis added inline comments.Aug 19 2022, 7:07 AM

llvm/include/llvm/BinaryFormat/MachO.h
2131	Huh, TIL, I suppose. What I used to believe up to now: There are two separate things: in-memory byte order, and order in which bitfields get assigned to their underlying words. The c99 standard (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf), 6.7.2.1.10: says about the latter "The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined." So I thought that the order bitfields are assigned is implementation-defined, but independent of byte order. But sysv abi (https://www.uclibc.org/docs/psABI-x86_64.pdf) paragraph 3.1 says: "bit-fields are allocated from right to left". So I thought, in practice, this settles things: Bit fields are assigned right-to-left (in a register, say), and then they assume system endianness when written to memory, conceptually. (In theory, gcc has all kinds of options here: https://gcc.gnu.org/onlinedocs/gccint/Storage-Layout.html BITS_BIG_ENDIAN, PCC_BITFIELD_TYPE_MATTERS, TARGET_MS_BITFIELD_LAYOUT_P, …) But https://godbolt.org/z/6sqo7nEd4 shows that this clearly isn't true. (Also, what's with the `65535` in the big-endian ppc64 output there? That looks completely wrong?) And CGRecordLayoutBuilder.cpp has: void CGRecordLowering::setBitFieldInfo( ... // Reverse the bit offsets for big endian machines. Because we represent // a bitfield as a single large integer load, we can imagine the bits // counting from the most-significant-bit instead of the // least-significant-bit. if (DataLayout.isBigEndian()) Info.Offset = Info.StorageSize - (Info.Offset + Info.Size); And gcc seems to match clang's behavior (see godbolt link), so that seems very intentional. Does anyone reading this know what specifies this? …aha, the ppc sysv abi (http://refspecs.linux-foundation.org/elf/elfspec_ppc.pdf) does say: " Bit-fields are allocated from right to left (least to most significant) on Little-Endian implementations and from left to right (most to least significant) on Big-Endian implementations." I wish I hadn't learned that!
llvm/lib/Object/MachOObjectFile.cpp
4935	(FWIW, I didn't know about `SignExtend32` either while I wrote the "looks weird" paragraph. I thought about how this could be written differently, then ran `rg -i signextend llvm` to see how other code handles this and then found SignExtend32 in there and figured it matches well enough.)
5004	The in-register view (i.e. not in-memory, we can ignore the byte-swapping of big-endian vs little-endian here) of struct dyld_chained_import { uint32_t lib_ordinal : 8; uint32_t weak_import : 1; uint32_t name_offset : 23; }; is: In little-endian (bits assigned right-to-left): most significant 23 bits are name_offset then 1 bit weak_import then 8 bit lib_ordinal in the lowest 8 bits In big-endian (bits assigned left-to-right): 8 bit lib_ordinal in the highest 8 bits then 1 bit weak_import least significant 23 bits are name_offset So I think even with the manual masking, this currently gets things wrong. Also, it's weird that we don't use the dyld_chained_import structs at all. Strange for greppability, using something like http://llvm-cs.pcc.me.uk/ for cross-references, etc. What do you think about adding back the Swap<> functions and manually swapping the bitfields around too after swapping the bytes? That assumes sysv abi, but that'll work almost everywhere (can't think of a place where it wouldn't? Maybe windows big endian? Didn't try that, but I wouldn't be surprised if clang-cl didn't work great there either), and if someone needs this to work on some obscure system, they can worry about it then?

thakis added inline comments.Aug 19 2022, 7:10 AM

llvm/lib/Object/MachOObjectFile.cpp
5004	Alternatively, something like https://github.com/freebsd/freebsd-src/blob/master/sys/netinet/ip.h#L51-L59 might also be an option.

We chatted a bit offline. Rough summary

https://github.com/freebsd/freebsd-src/blob/master/sys/netinet/ip.h#L51-L59 works if the on-disk format is the same for BE and LE, but since $(xcrun -show-sdk-path)/usr/include/mach-o/fixup-chains.h doesn't have any ifdefs, that's likely not the case here
…but all chained fixup apple platforms are little endian, so it's really unknowable
the current code does work on all hosts for when the on-disk data is little-endian
it'd be nice to do the swapping anyways, since that puts all the endianness handling in a single place
Bitfield swapping is different for little-endian-file-on-big-endian-host and the other way round, and the swapStruct() wrappers in BinaryFormat/MachO.h don't know which direction they're swapping in, so there's have to be a dedicated wrapper for each struct. Example for dyld_chained_import:

// Little-endian file running on big-endian host:
Raw = ((Raw & 0xff) << 24) | ((Raw & 0x100) << 15) || (Raw >> 9);

// Big-endian file running on little-endian host:
Raw = (Raw >> 24) | ((Raw & 0x800000) >> 15) || ((Raw & 0x7fffff << 9);

Since there's a bunch of those structs, it'd be nice to make a general helper for this. This seems to roughly work:

#include <stdio.h>

template<typename T, unsigned W>
T bitfield_swap_le(T t) {
  static_assert(W == 0, "");
  return 0;
}

template<typename T, unsigned W, unsigned First, int... Sizes>
T bitfield_swap_le(T t) {
  // Peel off rightmost field, add it on left, recurse.
  return ((t & ((1u << First) - 1)) << (W - First)) |
         bitfield_swap_le<T, W - First, Sizes...>(t >> First);
}

template<typename T, unsigned W>
T bitfield_swap_be(T t) {
  static_assert(W == 0, "");
  return 0;
}

template<typename T, unsigned W, unsigned First, int... Sizes>
T bitfield_swap_be(T t) {
  // Peel off leftmost field, add it on right, recurse.
  return ((t >> (W - First)) |
         (bitfield_swap_be<T, W - First, Sizes...>(t & ((1u << (W - First)) - 1))) << First);
}

int main() {
  {
    unsigned i = bitfield_swap_le<unsigned, 32, 8, 1, 23>(0xff);
    unsigned j = bitfield_swap_le<unsigned, 32, 8, 1, 23>(0x100);
    unsigned k = bitfield_swap_le<unsigned, 32, 8, 1, 23>(0xffff'fe00);
    printf("%08x %08x %08x\n", i, j, k);
  }
  {
    unsigned i = bitfield_swap_be<unsigned, 32, 8, 1, 23>(0xff00'0000);
    unsigned j = bitfield_swap_be<unsigned, 32, 8, 1, 23>(0x0080'0000);
    unsigned k = bitfield_swap_be<unsigned, 32, 8, 1, 23>(0x007f'ffff);
    printf("%08x %08x %08x\n", i, j, k);
  }
}

% clang bitfield_swap.cc -std=c++17
% ./a.out                          
ff000000 00800000 007fffff
000000ff 00000100 fffffe00

Then there could be a bitfield_swap<> member template that calls either of those based on file and host bitness, and that'd make it convenient to write a swapStruct() wrapper for each type with a bitfield

However, all likely host systems are little-endian too, so it also seems fine to just go back to the original code, add a comment, mark the test REQUIRES: host-byteorder-little-endian and do the rest later (or have someone who actually needs it do it)

Added some extra checks. Addends are now printed as signed integers, as in Apple otool.

Harbormaster completed remote builds in B182852: Diff 454844.Aug 23 2022, 10:27 AM

BertalanD added a child revision: D132560: [lld-macho] Add initial support for chained fixups.Aug 24 2022, 7:30 AM

Please land :)

llvm/lib/Object/MachOObjectFile.cpp
4996	(As discussed elsewhere, I think this is the worst of the approaches discussed above, but it's also not terribly important, so fine.)

Closed by commit rG686d8ce1ab16: [llvm-objdump] Complete -chained_fixups support (authored by BertalanD). · Explain WhyAug 24 2022, 10:29 AM

This revision was automatically updated to reflect the committed changes.

BertalanD added a commit: rG686d8ce1ab16: [llvm-objdump] Complete -chained_fixups support.

thakis mentioned this in D132036: [llvm-objdump] Add -dyld_info to llvm-otool.Aug 26 2022, 10:19 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

BinaryFormat/

MachO.h

24 lines

Object/

MachO.h

10 lines

lib/

Object/

MachOObjectFile.cpp

101 lines

test/

tools/

llvm-objdump/

MachO/

chained-fixups.test

22 lines

tools/

llvm-objdump/

MachODump.cpp

31 lines

Diff 453605

llvm/include/llvm/BinaryFormat/MachO.h

Show First 20 Lines • Show All 1,064 Lines • ▼ Show 20 Lines	struct dyld_chained_starts_in_segment {
uint16_t pointer_format; ///< DYLD_CHAINED_PTR*		uint16_t pointer_format; ///< DYLD_CHAINED_PTR*
uint64_t segment_offset; ///< VM offset from the __TEXT segment		uint64_t segment_offset; ///< VM offset from the __TEXT segment
uint32_t max_valid_pointer; ///< Values beyond this are not pointers on 32-bit		uint32_t max_valid_pointer; ///< Values beyond this are not pointers on 32-bit
uint16_t page_count; ///< Length of the page_start array		uint16_t page_count; ///< Length of the page_start array
uint16_t page_start[1]; ///< Page offset of first fixup on each page, or		uint16_t page_start[1]; ///< Page offset of first fixup on each page, or
///< DYLD_CHAINED_PTR_START_NONE if no fixups		///< DYLD_CHAINED_PTR_START_NONE if no fixups
};		};

		// DYLD_CHAINED_IMPORT
		struct dyld_chained_import {
		uint32_t lib_ordinal : 8;
		uint32_t weak_import : 1;
		uint32_t name_offset : 23;
		};

		// DYLD_CHAINED_IMPORT_ADDEND
		struct dyld_chained_import_addend {
		uint32_t lib_ordinal : 8;
		uint32_t weak_import : 1;
		uint32_t name_offset : 23;
		int32_t addend;
		};

		// DYLD_CHAINED_IMPORT_ADDEND64
		struct dyld_chained_import_addend64 {
		uint64_t lib_ordinal : 16;
		uint64_t weak_import : 1;
		uint64_t reserved : 15;
		uint64_t name_offset : 32;
		uint64_t addend;
		};

// Byte order swapping functions for MachO structs		// Byte order swapping functions for MachO structs

inline void swapStruct(fat_header &mh) {		inline void swapStruct(fat_header &mh) {
sys::swapByteOrder(mh.magic);		sys::swapByteOrder(mh.magic);
sys::swapByteOrder(mh.nfat_arch);		sys::swapByteOrder(mh.nfat_arch);
}		}

inline void swapStruct(fat_arch &mh) {		inline void swapStruct(fat_arch &mh) {
▲ Show 20 Lines • Show All 1,018 Lines • ▼ Show 20 Lines	inline void swapStruct(dyld_chained_starts_in_segment &C) {
sys::swapByteOrder(C.pointer_format);		sys::swapByteOrder(C.pointer_format);
sys::swapByteOrder(C.segment_offset);		sys::swapByteOrder(C.segment_offset);
sys::swapByteOrder(C.max_valid_pointer);		sys::swapByteOrder(C.max_valid_pointer);
sys::swapByteOrder(C.page_count);		sys::swapByteOrder(C.page_count);
// seg_info_offset entries must be byte swapped manually.		// seg_info_offset entries must be byte swapped manually.
}		}

/* code signing attributes of a process */		/* code signing attributes of a process */

		thakisUnsubmitted Not Done Reply Inline Actions It'd be nice to use bit_cast from llvm/ADT/bit.h here, which will ensure that Raw is the same size as dyld_chained_import. Here this is fairly easy I think: auto Raw = bit_cast<uint32_t>(C); sys::swapByteOrder(Raw); C = bit_cast<dyld_chained_import>(Raw); Below it's a bit more tricky, but I think this might work (all untested though): auto Raw = bit_cast<std::array<uint32_t, 2>>(C); sys::swapByteOrder(Raw[0]); sys::swapByteOrder(Raw[1]); C = bit_cast<(dyld_chained_import_addend>(Raw); (and analogously for the 64-bit version) thakis: It'd be nice to use bit_cast from llvm/ADT/bit.h here, which will ensure that Raw is the same…
		BertalanDAuthorUnsubmitted Done Reply Inline Actions I'm not convinced that swapping the bytes is enough for endianness conversion. https://www.naic.edu/~phil/notes/bitfieldStorage.html suggests that the bits within the bytes would also need to be reversed. I'm going to update this diff to use bit masks and shifts like D132036. BertalanD: I'm not convinced that swapping the bytes is enough for endianness conversion. https://www.naic.
		thakisUnsubmitted Not Done Reply Inline Actions Huh, TIL, I suppose. What I used to believe up to now: There are two separate things: in-memory byte order, and order in which bitfields get assigned to their underlying words. The c99 standard (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf), 6.7.2.1.10: says about the latter "The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined." So I thought that the order bitfields are assigned is implementation-defined, but independent of byte order. But sysv abi (https://www.uclibc.org/docs/psABI-x86_64.pdf) paragraph 3.1 says: "bit-fields are allocated from right to left". So I thought, in practice, this settles things: Bit fields are assigned right-to-left (in a register, say), and then they assume system endianness when written to memory, conceptually. (In theory, gcc has all kinds of options here: https://gcc.gnu.org/onlinedocs/gccint/Storage-Layout.html BITS_BIG_ENDIAN, PCC_BITFIELD_TYPE_MATTERS, TARGET_MS_BITFIELD_LAYOUT_P, …) But https://godbolt.org/z/6sqo7nEd4 shows that this clearly isn't true. (Also, what's with the `65535` in the big-endian ppc64 output there? That looks completely wrong?) And CGRecordLayoutBuilder.cpp has: void CGRecordLowering::setBitFieldInfo( ... // Reverse the bit offsets for big endian machines. Because we represent // a bitfield as a single large integer load, we can imagine the bits // counting from the most-significant-bit instead of the // least-significant-bit. if (DataLayout.isBigEndian()) Info.Offset = Info.StorageSize - (Info.Offset + Info.Size); And gcc seems to match clang's behavior (see godbolt link), so that seems very intentional. Does anyone reading this know what specifies this? …aha, the ppc sysv abi (http://refspecs.linux-foundation.org/elf/elfspec_ppc.pdf) does say: " Bit-fields are allocated from right to left (least to most significant) on Little-Endian implementations and from left to right (most to least significant) on Big-Endian implementations." I wish I hadn't learned that! thakis: Huh, TIL, I suppose. What I used to believe up to now: There are two separate things: in…
enum CodeSignAttrs {		enum CodeSignAttrs {
CS_VALID = 0x00000001, /* dynamically valid */		CS_VALID = 0x00000001, /* dynamically valid */
CS_ADHOC = 0x00000002, /* ad hoc signed */		CS_ADHOC = 0x00000002, /* ad hoc signed */
CS_GET_TASK_ALLOW = 0x00000004, /* has get-task-allow entitlement */		CS_GET_TASK_ALLOW = 0x00000004, /* has get-task-allow entitlement */
CS_INSTALLER = 0x00000008, /* has installer entitlement */		CS_INSTALLER = 0x00000008, /* has installer entitlement */

CS_FORCED_LV =		CS_FORCED_LV =
0x00000010, /* Library Validation required by Hardened System Policy */		0x00000010, /* Library Validation required by Hardened System Policy */
▲ Show 20 Lines • Show All 202 Lines • Show Last 20 Lines

llvm/include/llvm/Object/MachO.h

	Show First 20 Lines • Show All 268 Lines • ▼ Show 20 Lines
	/// WeakImport == true			/// WeakImport == true
	/// The associated bind may be set to 0 if this symbol is missing from its			/// The associated bind may be set to 0 if this symbol is missing from its
	/// parent library. This is called a "weak import."			/// parent library. This is called a "weak import."
	/// LibOrdinal == BIND_SPECIAL_DYLIB_WEAK_LOOKUP			/// LibOrdinal == BIND_SPECIAL_DYLIB_WEAK_LOOKUP
	/// This symbol may be coalesced with other libraries vending the same			/// This symbol may be coalesced with other libraries vending the same
	/// symbol. E.g., C++'s "operator new". This is called a "weak bind."			/// symbol. E.g., C++'s "operator new". This is called a "weak bind."
	struct ChainedFixupTarget {			struct ChainedFixupTarget {
	public:			public:
	ChainedFixupTarget(int LibOrdinal, StringRef Symbol, uint64_t Addend,			ChainedFixupTarget(int LibOrdinal, uint32_t NameOffset, StringRef Symbol,
	bool WeakImport)			uint64_t Addend, bool WeakImport)
	: LibOrdinal(LibOrdinal), SymbolName(Symbol), Addend(Addend),			: LibOrdinal(LibOrdinal), NameOffset(NameOffset), SymbolName(Symbol),
	WeakImport(WeakImport) {}			Addend(Addend), WeakImport(WeakImport) {}

	int libOrdinal() { return LibOrdinal; }			int libOrdinal() { return LibOrdinal; }
				uint32_t nameOffset() { return NameOffset; }
	StringRef symbolName() { return SymbolName; }			StringRef symbolName() { return SymbolName; }
	uint64_t addend() { return Addend; }			uint64_t addend() { return Addend; }
	bool weakImport() { return WeakImport; }			bool weakImport() { return WeakImport; }
	bool weakBind() {			bool weakBind() {
	return LibOrdinal == MachO::BIND_SPECIAL_DYLIB_WEAK_LOOKUP;			return LibOrdinal == MachO::BIND_SPECIAL_DYLIB_WEAK_LOOKUP;
	}			}

	private:			private:
	int LibOrdinal;			int LibOrdinal;
				uint32_t NameOffset;
	StringRef SymbolName;			StringRef SymbolName;
	uint64_t Addend;			uint64_t Addend;
	bool WeakImport;			bool WeakImport;
	};			};

	/// MachOAbstractFixupEntry is an abstract class representing a fixup in a			/// MachOAbstractFixupEntry is an abstract class representing a fixup in a
	/// MH_DYLDLINK file. Fixups generally represent rebases and binds. Binds also			/// MH_DYLDLINK file. Fixups generally represent rebases and binds. Binds also
	/// subdivide into additional subtypes (weak, lazy, reexport).			/// subdivide into additional subtypes (weak, lazy, reexport).
	▲ Show 20 Lines • Show All 607 Lines • Show Last 20 Lines

llvm/lib/Object/MachOObjectFile.cpp

Show All 13 Lines
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/None.h"		#include "llvm/ADT/None.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/StringSwitch.h"		#include "llvm/ADT/StringSwitch.h"
#include "llvm/ADT/Triple.h"		#include "llvm/ADT/Triple.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
		#include "llvm/ADT/bit.h"
#include "llvm/BinaryFormat/MachO.h"		#include "llvm/BinaryFormat/MachO.h"
#include "llvm/BinaryFormat/Swift.h"		#include "llvm/BinaryFormat/Swift.h"
#include "llvm/Object/Error.h"		#include "llvm/Object/Error.h"
#include "llvm/Object/MachO.h"		#include "llvm/Object/MachO.h"
#include "llvm/Object/ObjectFile.h"		#include "llvm/Object/ObjectFile.h"
#include "llvm/Object/SymbolicFile.h"		#include "llvm/Object/SymbolicFile.h"
#include "llvm/Support/DataExtractor.h"		#include "llvm/Support/DataExtractor.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
▲ Show 20 Lines • Show All 4,889 Lines • ▼ Show 20 Lines	for (size_t I = 0, N = ImageStarts.seg_count; I < N; ++I) {
}		}

Segments.emplace_back(I, *OffOrErr, Seg, std::move(PageStarts));		Segments.emplace_back(I, *OffOrErr, Seg, std::move(PageStarts));
}		}

return std::make_pair(ImageStarts.seg_count, Segments);		return std::make_pair(ImageStarts.seg_count, Segments);
}		}

		// The special library ordinals have a negative value, but they are encoded in
		// an unsigned bitfield, so we need to sign extend the value.
		template <typename T> static int getEncodedOrdinal(T Value) {
		if (Value == static_cast<T>(MachO::BIND_SPECIAL_DYLIB_MAIN_EXECUTABLE) \|\|
		Value == static_cast<T>(MachO::BIND_SPECIAL_DYLIB_FLAT_LOOKUP) \|\|
		Value == static_cast<T>(MachO::BIND_SPECIAL_DYLIB_WEAK_LOOKUP))
		return SignExtend32<sizeof(T) * CHAR_BIT>(Value);
		return Value;
		thakisUnsubmitted Not Done Reply Inline Actions nit: no else after return This function looks weird! It almost does `if (V == a) return a` a bunch of times followed by `return V`, and the special cases look pretty pointless at first. (I understand they're needed to sign-extend the 3 special values to int, while the input is 8 or 16 bits.) It feels like there should be a nicer way to write this. Maybe if (Value == static_cast<T>(MachO::BIND_SPECIAL_DYLIB_MAIN_EXECUTABLE) \|\| Value == static_cast<T>(MachO::BIND_SPECIAL_DYLIB_FLAT_LOOKUP) \|\| third case) return SignExtend32<sizeof(T) * 8>(Value); return Value; Not sure if that's actually better, but it makes the difference a bit more explicit maybe. Up to you :) thakis: nit: no else after return This function looks weird! It almost does `if (V == a) return a` a…
		BertalanDAuthorUnsubmitted Done Reply Inline Actions I didn't know about `SignExtend32`, thank you for mentioning it. Your version does indeed look nicer than what I wrote. BertalanD: I didn't know about `SignExtend32`, thank you for mentioning it. Your version does indeed look…
		thakisUnsubmitted Not Done Reply Inline Actions (FWIW, I didn't know about `SignExtend32` either while I wrote the "looks weird" paragraph. I thought about how this could be written differently, then ran `rg -i signextend llvm` to see how other code handles this and then found SignExtend32 in there and figured it matches well enough.) thakis: (FWIW, I didn't know about `SignExtend32` either while I wrote the "looks weird" paragraph. I…
		}

		template <typename T, unsigned N>
		std::array<T, N> getArray(const MachOObjectFile &O, const void *Ptr) {
		std::array<T, N> RawValue;
		memcpy(RawValue.data(), Ptr, N * sizeof(T));
		if (O.isLittleEndian() != sys::IsLittleEndianHost)
		for (auto &Element : RawValue)
		sys::swapByteOrder(Element);
		return RawValue;
		}

Expected<std::vector<ChainedFixupTarget>>		Expected<std::vector<ChainedFixupTarget>>
MachOObjectFile::getDyldChainedFixupTargets() const {		MachOObjectFile::getDyldChainedFixupTargets() const {
		auto CFOrErr = getChainedFixupsLoadCommand();
		if (!CFOrErr)
		return CFOrErr.takeError();

		std::vector<ChainedFixupTarget> Targets;
		if (!CFOrErr->has_value())
		return Targets;

		const MachO::linkedit_data_command &DyldChainedFixups = **CFOrErr;

auto CFHeaderOrErr = getChainedFixupsHeader();		auto CFHeaderOrErr = getChainedFixupsHeader();
if (!CFHeaderOrErr)		if (!CFHeaderOrErr)
return CFHeaderOrErr.takeError();		return CFHeaderOrErr.takeError();
std::vector<ChainedFixupTarget> Targets;
if (!(*CFHeaderOrErr))		if (!(*CFHeaderOrErr))
return Targets;		return Targets;
		const MachO::dyld_chained_fixups_header &Header = **CFHeaderOrErr;

		size_t ImportSize = 0;
		if (Header.imports_format == MachO::DYLD_CHAINED_IMPORT)
		ImportSize = sizeof(MachO::dyld_chained_import);
		thakisUnsubmitted Not Done Reply Inline Actions Isn't `sizeof(dyld_chained_import)` clearer here? (likewise in the other two cases) thakis: Isn't `sizeof(dyld_chained_import)` clearer here? (likewise in the other two cases)
		else if (Header.imports_format == MachO::DYLD_CHAINED_IMPORT_ADDEND)
		ImportSize = sizeof(MachO::dyld_chained_import_addend);
		else if (Header.imports_format == MachO::DYLD_CHAINED_IMPORT_ADDEND64)
		ImportSize = sizeof(MachO::dyld_chained_import_addend64);
		else
		return malformedError("bad chained fixups: unknown imports format: " +
		Twine(Header.imports_format));

		const char Contents = getPtr(this, DyldChainedFixups.dataoff);
		const char *Imports = Contents + Header.imports_offset;
		size_t ImportsEndOffset =
		Header.imports_offset + ImportSize * Header.imports_count;
		const char *ImportsEnd = Contents + ImportsEndOffset;
		const char *Symbols = Contents + Header.symbols_offset;
		const char *SymbolsEnd = Contents + DyldChainedFixups.datasize;

		if (ImportsEnd > Symbols)
		return malformedError("bad chained fixups: imports end " +
		Twine(ImportsEndOffset) + " extends past end " +
		Twine(DyldChainedFixups.datasize));

		if (ImportsEnd > Symbols)
		return malformedError("bad chained fixups: imports end " +
		Twine(ImportsEndOffset) + " overlaps with symbols");

		for (const char *ImportPtr = Imports; ImportPtr < ImportsEnd;
		ImportPtr += ImportSize) {
		thakisUnsubmitted Not Done Reply Inline Actions (As discussed elsewhere, I think this is the worst of the approaches discussed above, but it's also not terribly important, so fine.) thakis: (As discussed elsewhere, I think this is the worst of the approaches discussed above, but it's…
		int LibOrdinal;
		bool WeakImport;
		uint32_t NameOffset;
		uint64_t Addend;
		if (Header.imports_format == MachO::DYLD_CHAINED_IMPORT) {
		auto RawValue = getArray<uint32_t, 1>(*this, ImportPtr);

		LibOrdinal = getEncodedOrdinal<uint8_t>(RawValue[0] & 0xFF);
		thakisUnsubmitted Not Done Reply Inline Actions The in-register view (i.e. not in-memory, we can ignore the byte-swapping of big-endian vs little-endian here) of struct dyld_chained_import { uint32_t lib_ordinal : 8; uint32_t weak_import : 1; uint32_t name_offset : 23; }; is: In little-endian (bits assigned right-to-left): most significant 23 bits are name_offset then 1 bit weak_import then 8 bit lib_ordinal in the lowest 8 bits In big-endian (bits assigned left-to-right): 8 bit lib_ordinal in the highest 8 bits then 1 bit weak_import least significant 23 bits are name_offset So I think even with the manual masking, this currently gets things wrong. Also, it's weird that we don't use the dyld_chained_import structs at all. Strange for greppability, using something like http://llvm-cs.pcc.me.uk/ for cross-references, etc. What do you think about adding back the Swap<> functions and manually swapping the bitfields around too after swapping the bytes? That assumes sysv abi, but that'll work almost everywhere (can't think of a place where it wouldn't? Maybe windows big endian? Didn't try that, but I wouldn't be surprised if clang-cl didn't work great there either), and if someone needs this to work on some obscure system, they can worry about it then? thakis: The in-register view (i.e. not in-memory, we can ignore the byte-swapping of big-endian vs…
		thakisUnsubmitted Not Done Reply Inline Actions Alternatively, something like https://github.com/freebsd/freebsd-src/blob/master/sys/netinet/ip.h#L51-L59 might also be an option. thakis: Alternatively, something like https://github.com/freebsd/freebsd-src/blob/master/sys/netinet/ip.
		WeakImport = (RawValue[0] >> 8) & 1;
		thakisUnsubmitted Not Done Reply Inline Actions This has GetEncodedOrdinal with upper-case G while the function above has it with lower-case g. Can this build? Maybe you uploaded a half-committed patch? thakis: This has GetEncodedOrdinal with upper-case G while the function above has it with lower-case g.
		NameOffset = RawValue[0] >> 9;
		Addend = 0;
		} else if (Header.imports_format == MachO::DYLD_CHAINED_IMPORT_ADDEND) {
		auto RawValue = getArray<uint32_t, 2>(*this, ImportPtr);

		LibOrdinal = getEncodedOrdinal<uint8_t>(RawValue[0] & 0xFF);
		WeakImport = (RawValue[0] >> 8) & 1;
		NameOffset = RawValue[0] >> 9;
		Addend = bit_cast<int32_t>(RawValue[1]);
		} else if (Header.imports_format == MachO::DYLD_CHAINED_IMPORT_ADDEND64) {
		auto RawValue = getArray<uint64_t, 2>(*this, ImportPtr);

		LibOrdinal = getEncodedOrdinal<uint16_t>(RawValue[0] & 0xFFFF);
		NameOffset = (RawValue[0] >> 16) & 1;
		WeakImport = RawValue[0] >> 17;
		Addend = RawValue[1];
		} else {
		llvm_unreachable("Import format should have been checked");
		}

		const char *Str = Symbols + NameOffset;
		if (Str >= SymbolsEnd)
		return malformedError("bad chained fixups: symbol offset " +
		thakisUnsubmitted Not Done Reply Inline Actions nit: should this return a malformedError instead? it's data-dependent, so it's not truly unreachable. (but up to you) thakis: nit: should this return a malformedError instead? it's data-dependent, so it's not truly…
		BertalanDAuthorUnsubmitted Done Reply Inline Actions I'll add an error message to where we're computing `ImportSize`, and then this branch will be truly unreachable. BertalanD: I'll add an error message to where we're computing `ImportSize`, and then this branch will be…
		Twine(NameOffset) + " extends past end " +
		Twine(DyldChainedFixups.datasize));
		Targets.emplace_back(LibOrdinal, NameOffset, Str, Addend, WeakImport);
		}

return Targets;		return Targets;
}		}

ArrayRef<uint8_t> MachOObjectFile::getDyldInfoExportsTrie() const {		ArrayRef<uint8_t> MachOObjectFile::getDyldInfoExportsTrie() const {
if (!DyldInfoLoadCmd)		if (!DyldInfoLoadCmd)
return None;		return None;

auto DyldInfoOrErr =		auto DyldInfoOrErr =
▲ Show 20 Lines • Show All 159 Lines • Show Last 20 Lines

llvm/test/tools/llvm-objdump/MachO/chained-fixups.test

	Show All 37 Lines
	DETAILS-NEXT: pointer_format = 6 (DYLD_CHAINED_PTR_64_OFFSET)			DETAILS-NEXT: pointer_format = 6 (DYLD_CHAINED_PTR_64_OFFSET)
	DETAILS-NEXT: segment_offset = 0x3f0			DETAILS-NEXT: segment_offset = 0x3f0
	DETAILS-NEXT: max_valid_pointer = 0			DETAILS-NEXT: max_valid_pointer = 0
	DETAILS-NEXT: page_count = 4			DETAILS-NEXT: page_count = 4
	DETAILS-NEXT: page_start[0] = 0			DETAILS-NEXT: page_start[0] = 0
	DETAILS-NEXT: page_start[1] = 32			DETAILS-NEXT: page_start[1] = 32
	DETAILS-NEXT: page_start[2] = 65535 (DYLD_CHAINED_PTR_START_NONE)			DETAILS-NEXT: page_start[2] = 65535 (DYLD_CHAINED_PTR_START_NONE)
	DETAILS-NEXT: page_start[3] = 32			DETAILS-NEXT: page_start[3] = 32
				DETAILS-NEXT: dyld chained import[0]
				DETAILS-NEXT: lib_ordinal = -2 (flat-namespace)
				DETAILS-NEXT: weak_import = 0
				DETAILS-NEXT: name_offset = 1 (_dynamicLookup)
				DETAILS-NEXT: dyld chained import[1]
				DETAILS-NEXT: lib_ordinal = 1 (libdylib)
				DETAILS-NEXT: weak_import = 1
				DETAILS-NEXT: name_offset = 16 (_weakImport)
				DETAILS-NEXT: dyld chained import[2]
				DETAILS-NEXT: lib_ordinal = 1 (libdylib)
				DETAILS-NEXT: weak_import = 0
				DETAILS-NEXT: name_offset = 28 (_dylib)
				DETAILS-NEXT: dyld chained import[3]
				DETAILS-NEXT: lib_ordinal = -3 (weak)
				DETAILS-NEXT: weak_import = 0
				DETAILS-NEXT: name_offset = 35 (_weakLocal)
				DETAILS-NEXT: dyld chained import[4]
				DETAILS-NEXT: lib_ordinal = -3 (weak)
				DETAILS-NEXT: weak_import = 0
				DETAILS-NEXT: name_offset = 46 (_weak)

	## This test checks that the output is identical to that of cctools-1001.2 (XCode 14)			## This test checks that the output is identical to that of cctools-1001.2 (XCode 14)
				## FIXME: Print encoded values of the dyld_chained_import* entries
				##
	## The input was generated from the following files:			## The input was generated from the following files:
	##			##
	## --- dylib.s:			## --- dylib.s:
	## .data			## .data
	## .globl _weak, _dylib, _weakImport			## .globl _weak, _dylib, _weakImport
	## .weak_definition _weak			## .weak_definition _weak
	## _weak:			## _weak:
	## _dylib:			## _dylib:
	Show All 31 Lines

llvm/tools/llvm-objdump/MachODump.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
bool objdump::ObjcMetaData;		bool objdump::ObjcMetaData;
std::string objdump::DisSymName;		std::string objdump::DisSymName;
bool objdump::SymbolicOperands;		bool objdump::SymbolicOperands;
static std::vector<std::string> ArchFlags;		static std::vector<std::string> ArchFlags;

static bool ArchAll = false;		static bool ArchAll = false;
static std::string ThumbTripleName;		static std::string ThumbTripleName;

		static StringRef ordinalName(const object::MachOObjectFile *, int);

void objdump::parseMachOOptions(const llvm::opt::InputArgList &InputArgs) {		void objdump::parseMachOOptions(const llvm::opt::InputArgList &InputArgs) {
FirstPrivateHeader = InputArgs.hasArg(OBJDUMP_private_header);		FirstPrivateHeader = InputArgs.hasArg(OBJDUMP_private_header);
ExportsTrie = InputArgs.hasArg(OBJDUMP_exports_trie);		ExportsTrie = InputArgs.hasArg(OBJDUMP_exports_trie);
Rebase = InputArgs.hasArg(OBJDUMP_rebase);		Rebase = InputArgs.hasArg(OBJDUMP_rebase);
Rpaths = InputArgs.hasArg(OBJDUMP_rpaths);		Rpaths = InputArgs.hasArg(OBJDUMP_rpaths);
Bind = InputArgs.hasArg(OBJDUMP_bind);		Bind = InputArgs.hasArg(OBJDUMP_bind);
LazyBind = InputArgs.hasArg(OBJDUMP_lazy_bind);		LazyBind = InputArgs.hasArg(OBJDUMP_lazy_bind);
WeakBind = InputArgs.hasArg(OBJDUMP_weak_bind);		WeakBind = InputArgs.hasArg(OBJDUMP_weak_bind);
▲ Show 20 Lines • Show All 1,172 Lines • ▼ Show 20 Lines	for (auto [Index, PageStart] : enumerate(Segment.PageStarts)) {
outs() << " page_start[" << Index << "] = " << PageStart;		outs() << " page_start[" << Index << "] = " << PageStart;
// FIXME: Support DYLD_CHAINED_PTR_START_MULTI (32-bit only)		// FIXME: Support DYLD_CHAINED_PTR_START_MULTI (32-bit only)
if (PageStart == MachO::DYLD_CHAINED_PTR_START_NONE)		if (PageStart == MachO::DYLD_CHAINED_PTR_START_NONE)
outs() << " (DYLD_CHAINED_PTR_START_NONE)";		outs() << " (DYLD_CHAINED_PTR_START_NONE)";
outs() << '\n';		outs() << '\n';
}		}
}		}

		static void PrintChainedFixupTarget(ChainedFixupTarget &Target, size_t Idx,
		int Format, MachOObjectFile *O) {
		if (Format == MachO::DYLD_CHAINED_IMPORT)
		outs() << "dyld chained import";
		else if (Format == MachO::DYLD_CHAINED_IMPORT_ADDEND)
		outs() << "dyld chained import addend";
		else if (Format == MachO::DYLD_CHAINED_IMPORT_ADDEND64)
		outs() << "dyld chained import addend64";
		// FIXME: otool prints the encoded value as well.
		thakisUnsubmitted Not Done Reply Inline Actions It makes sense to me to wait with this until the upstreaming has happened to see how to best implement this then. thakis: It makes sense to me to wait with this until the upstreaming has happened to see how to best…
		outs() << '[' << Idx << "]\n";

		outs() << " lib_ordinal = " << Target.libOrdinal() << " ("
		<< ordinalName(O, Target.libOrdinal()) << ")\n";
		outs() << " weak_import = " << Target.weakImport() << '\n';
		outs() << " name_offset = " << Target.nameOffset() << " ("
		<< Target.symbolName() << ")\n";
		if (Format != MachO::DYLD_CHAINED_IMPORT)
		outs() << " addend = " << Target.addend() << '\n';
		}

static void PrintChainedFixups(MachOObjectFile *O) {		static void PrintChainedFixups(MachOObjectFile *O) {
// MachOObjectFile::getChainedFixupsHeader() reads LC_DYLD_CHAINED_FIXUPS.		// MachOObjectFile::getChainedFixupsHeader() reads LC_DYLD_CHAINED_FIXUPS.
// FIXME: Support chained fixups in __TEXT,__chain_starts section too.		// FIXME: Support chained fixups in __TEXT,__chain_starts section too.
auto ChainedFixupHeader =		auto ChainedFixupHeader =
unwrapOrError(O->getChainedFixupsHeader(), O->getFileName());		unwrapOrError(O->getChainedFixupsHeader(), O->getFileName());
if (!ChainedFixupHeader)		if (!ChainedFixupHeader)
return;		return;

Show All 16 Lines	for (size_t I = 0; I < SegCount; ++I) {

outs() << " seg_offset[" << I << "] = " << SegOffset << " ("		outs() << " seg_offset[" << I << "] = " << SegOffset << " ("
<< SegNames[I] << ")\n";		<< SegNames[I] << ")\n";
}		}

for (const MachOObjectFile::ChainedFixupsSegment &S : Segments)		for (const MachOObjectFile::ChainedFixupsSegment &S : Segments)
PrintChainedFixupsSegment(S, SegNames[S.SegIdx]);		PrintChainedFixupsSegment(S, SegNames[S.SegIdx]);

// FIXME: Print more things.		auto FixupTargets =
		unwrapOrError(O->getDyldChainedFixupTargets(), O->getFileName());

		uint32_t ImportsFormat = ChainedFixupHeader->imports_format;
		for (auto [Idx, Target] : enumerate(FixupTargets))
		PrintChainedFixupTarget(Target, Idx, ImportsFormat, O);
}		}

static void PrintDyldInfo(MachOObjectFile *O) {		static void PrintDyldInfo(MachOObjectFile *O) {
outs() << "dyld information:" << '\n';		outs() << "dyld information:" << '\n';
printMachOChainedFixups(O);		printMachOChainedFixups(O);
}		}

static void PrintDylibs(MachOObjectFile *O, bool JustId) {		static void PrintDylibs(MachOObjectFile *O, bool JustId) {
▲ Show 20 Lines • Show All 9,177 Lines • ▼ Show 20 Lines	static StringRef ordinalName(const object::MachOObjectFile *Obj, int Ordinal) {
StringRef DylibName;		StringRef DylibName;
switch (Ordinal) {		switch (Ordinal) {
case MachO::BIND_SPECIAL_DYLIB_SELF:		case MachO::BIND_SPECIAL_DYLIB_SELF:
return "this-image";		return "this-image";
case MachO::BIND_SPECIAL_DYLIB_MAIN_EXECUTABLE:		case MachO::BIND_SPECIAL_DYLIB_MAIN_EXECUTABLE:
return "main-executable";		return "main-executable";
case MachO::BIND_SPECIAL_DYLIB_FLAT_LOOKUP:		case MachO::BIND_SPECIAL_DYLIB_FLAT_LOOKUP:
return "flat-namespace";		return "flat-namespace";
		case MachO::BIND_SPECIAL_DYLIB_WEAK_LOOKUP:
		return "weak";
default:		default:
if (Ordinal > 0) {		if (Ordinal > 0) {
std::error_code EC =		std::error_code EC =
Obj->getLibraryShortNameByIndex(Ordinal - 1, DylibName);		Obj->getLibraryShortNameByIndex(Ordinal - 1, DylibName);
if (EC)		if (EC)
return "<<bad library ordinal>>";		return "<<bad library ordinal>>";
return DylibName;		return DylibName;
}		}
▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines