This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/
-
MachO/
-
Arch/
8/10
ARM64.cpp
-
Config.h
-
Driver.cpp
-
InputFiles.h
11/14
InputFiles.cpp
-
InputSection.h
1/1
InputSection.cpp
-
Options.td
2/2
Relocations.h
-
Target.h
-
test/MachO/
-
MachO/
-
invalid/
-
invalid-loh.s
1/2
loh-adrp-add.s
-
loh-adrp-adrp.s
-
llvm/include/llvm/BinaryFormat/
-
include/
-
llvm/
-
BinaryFormat/
-
MachO.h

Differential D128093

[lld-macho] Initial support for Linker Optimization Hints
ClosedPublic

Authored by BertalanD on Jun 17 2022, 1:06 PM.

Download Raw Diff

Details

Reviewers

int3
thakis

Group Reviewers

Restricted Project

Commits

rGa3f67f0920ea: [lld-macho] Initial support for Linker Optimization Hints

Summary

Linker optimization hints are used for marking a sequence of
instructions used for synthesizing an address, like ADRP+ADD. If the
referenced symbol ends up close enough, it can be replaced by a faster
sequence of instructions like ADR+NOP.

This commit adds support for 2 of the 7 defined ARM64 optimization
hints:

LOH_ARM64_ADRP_ADD, which transforms a pair of ADRP+ADD into ADR+NOP if the referenced address is within +/- 1 MiB
LOH_ARM64_ADRP_ADRP, which transforms two ADRP instructions into ADR+NOP if they reference the same page

These two kinds already cover more than 50% of all cases in Chromium.

I test-built Chromium with this patch applied.

The added overhead on linking Chromium is 200 milliseconds on my M1 Mac mini, which accounts for about 4% of the total runtime. About half of that is used for parsing the optimization hint data, and the rest is spent actually checking if the optimization can be done and performing it.

Suggestions on how to make it quicker are appreciated :)

Please commit this diff with the following author info:
Daniel Bertalan <dani@danielbertalan.dev>

Diff Detail

Unit TestsFailed

	Time	Test
	60,030 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test

Event Timeline

BertalanD created this revision.Jun 17 2022, 1:06 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 17 2022, 1:06 PM

Herald added a subscriber: kristof.beyls. · View Herald Transcript

BertalanD requested review of this revision.Jun 17 2022, 1:06 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2022, 1:06 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Check for !ignoreOptimizationHints before inserting into performedRelocs.

BertalanD edited the summary of this revision. (Show Details)Jun 17 2022, 1:15 PM

I spend some searching and found your enum values here:
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/MC/MCLinkerOptimizationHint.h#L33

Harbormaster completed remote builds in B170597: Diff 438025.Jun 17 2022, 2:51 PM

Reserve the expected size of performedRelocations upfront
Store the optimization hints into a large per-object vector instead of a per-(sub)section one. Brings down the overhead to just above 7% when linking Chromium.

Herald added a subscriber: mgrang. · View Herald TranscriptJun 18 2022, 10:57 AM

Harbormaster completed remote builds in B170680: Diff 438136.Jun 18 2022, 10:58 AM

tschuett added inline comments.Jun 18 2022, 11:01 AM

lld/MachO/Arch/ARM64.cpp
167	You could move the ADD into the anonymous namespace above.
238	Why are the static functions in an anonymous namespace?

Here is the reference for anonymous namespaces:
https://llvm.org/docs/CodingStandards.html#anonymous-namespaces

I spend some searching and found your enum values here:
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/MC/MCLinkerOptimizationHint.h#L33

Should we use the MC constants in lld? Or use the Mach-O constants in MC?

Comply with the Coding Standards regarding the use of anonymous namespaces. Take ArrayRef instead of const std::vector& as parameter.

BertalanD marked 2 inline comments as done.Jun 19 2022, 2:40 AM

Harbormaster completed remote builds in B170719: Diff 438187.Jun 19 2022, 2:41 AM

grandinj added a subscriber: grandinj.Jun 19 2022, 5:02 AM

grandinj added inline comments.

lld/MachO/InputFiles.cpp
509	I'm guessing if you run a profiler over this (like perf), you will see significant time spent in resizing/re-allocating the vector. You could avoid this by calling reserve before the loop to reserve storage. Alternatively, just use std::deque instead. It handles this case well, and doesn't need to re-allocate/resize.

In D128093#3594579, @BertalanD wrote:

Should we use the MC constants in lld? Or use the Mach-O constants in MC?

MC depends on BinaryFormat, so I think making MC use the BinaryFormat constants in MC feels a bit better.

(lld does also depend on MC, but only for StringTableBuilder (and indirectly for LTO), but that feels a bit strange to me too.)

Pre-allocated the vector storing the parsed optimization hints. Changed parsedRelocs to be a sorted array instead of a DenseMap. This decreased the runtime penalty to under 200 ms when linking Chromium on an M1 Mac mini. (Chromium has over 4 million LOHs!)

BertalanD marked an inline comment as done.Jun 19 2022, 10:42 AM

BertalanD added inline comments.

lld/MachO/InputFiles.cpp
509	We need `optimizationHints` to be a contiguous block of memory if we want to store an `ArrayRef<OptimizationHint>` in each (sub)section. That way, it's faster than making many smaller `vector<OptimizationHint>` allocations for each `InputSection`. The data is encoded in the variable length ULEB128 format, so we don't exactly know how many elements there are until we've parsed the section fully and encountered the terminating element. But we can make an (over)estimation.

The sorted array is a great idea:
https://llvm.org/docs/ProgrammersManual.html#a-sorted-vector

BertalanD edited the summary of this revision. (Show Details)Jun 19 2022, 10:47 AM

BertalanD marked an inline comment as done.

Sorry! I have another trick for you:
https://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop

grandinj added inline comments.Jun 19 2022, 11:03 AM

lld/MachO/Relocations.h
84	() offsets[2] is never used, so it looks like this array can drop the 3rd element. () I suspect it may in fact be cheaper to drop the minOffset field, and make it be a member function that calculates the minimum each time, since the code that touches this is likely to be dominated by cache-hit ratio, and making this structure smaller will increase that ratio, but you'd have to benchmark that. (*) which brings me to - are these offsets completely independent of each other, or could they be expressed, for example, as uint64_t offset_base uint32_t offset_offsets[2]; // add this to offset_base to get the actual offset possibly? Which would further increase cache density.

Harbormaster completed remote builds in B170731: Diff 438204.Jun 19 2022, 11:37 AM

For adrp+add, you may take some inspiration from D117614. Note the needed tests as well.

At least for applyOptimizationHints, this could help https://reviews.llvm.org/D128140.

(*) offsets[2] is never used, so it looks like this array can drop the 3rd element.

The other types of optimization hints do make use of this third parameter, and I plan to add them very soon.

I'll look into packing the OptimizationHint struct a bit better tomorrow. That would sacrifice on generality, but I don't think we need to worry about pairs of instructions that are so far apart. Note to self: if it does happen, fail gracefully.

Note the needed tests as well.

@MaskRay What other tests should I add? I'm definitely going to add a case where the target is at a lower VM address than the instruction pair. Testing all the other failure paths (no relocations at the specified address, the relocations point at a different address, the registers don't match, the target instructions aren't actually ADRP/ADD) sounds a bit excessive, especially as these can only happen if the input is malformed. But if you think there's some value in that, I will of course add those.

At least for applyOptimizationHints, this could help https://reviews.llvm.org/D128140.

Sections are already relocated in parallel, which includes performing these relaxations. I don't think there are any guarantees that for each relocation, there's a unique LOH, so we'd open up ourselves to concurrency bugs if we started modifying the same InputSection's output buffer in parallel.

The added overhead on linking Chromium is 200 milliseconds on my M1 Mac mini, which accounts for about 4% of the total runtime.

Is there measurable overhead when -ignore_optimization_hints is passed? I'm wondering about the overhead from increasing the Reloc struct size

4% overhead is not such a big deal IMO if we can disable it in developer builds (though the fact that it's enabled by default is unfortunate)

lld/MachO/Arch/ARM64.cpp
167	nit: I would prefer `parseAdrp` (would also be more consistent with `applyAdrpAdrp` below)
184	`op` or `opcode` would be more accurate
214–215	do we ever expect an optimization hint to be emitted for instructions whose src/dest don't match? does ld64 perform this same validation? ditto for the referentVA check below
lld/MachO/InputFiles.cpp
453	can you include a comment block explaining what optimization hints are? something like what you put down in the commit message would be a good. Would also love an explanation of how `minOffset` is important. we don't always have the most detailed comments, but if you check the top of InputFiles.cpp, there's a pretty detailed comment about the Mach-O format -- that's a good ideal to strive for :)
492–493	do we actually expect this to happen in practice? can we just error out and return early?

In D128093#3599802, @int3 wrote:

The added overhead on linking Chromium is 200 milliseconds on my M1 Mac mini, which accounts for about 4% of the total runtime.

Is there measurable overhead when -ignore_optimization_hints is passed? I'm wondering about the overhead from increasing the Reloc struct size

4% overhead is not such a big deal IMO if we can disable it in developer builds (though the fact that it's enabled by default is unfortunate)

(This makes sense to me, FWIW!)

(Also, not sure it's such a bad thing that it defaults to on. Smaller projects won't notice the link time cost, and larger projects will have lots of tweaking flags for better or worse anyways…)

Is there measurable overhead when -ignore_optimization_hints is passed? I'm wondering about the overhead from increasing the Reloc struct size

Here are the measurements taken on a 32 vCPU Google Compute Engine instance:

x before.txt
+ after.txt
+------------------------------------------------------------------------------+
|                                   x  x       x   +   +  +                    |
|x     x x x       x x * x      ++x x++x x xxx * + +  ++  + ++   +   +x +     +|
|              |________________A___M_|__________|__A__M__________|            |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  20     11.418539     11.729058     11.576444     11.557527   0.076616973
+  20      11.51961      11.76536     11.661534     11.649677   0.063851569
Difference at 95.0% confidence
	0.0921498 +/- 0.0451383
	0.797314% +/- 0.390554%
	(Student's t, pooled s = 0.0705237)

With --threads=1, it says "No difference proven at 95.0% confidence".
I'll check if there are any opportunities to pack the InputSection class a bit better.

BertalanD added inline comments.Jun 23 2022, 7:39 AM

lld/MachO/Arch/ARM64.cpp
214–215	ld64 checks these conditions and emits a warning if they don't hold. I assume it means that these are expected to be true for valid objects. I tested it with this file: .align 2 .globl _main _main: L1: adrp x0, _foo@PAGE L2: add x1, x1, _foo@PAGEOFF L3: add x0, x0, _bar@PAGEOFF L4: nop .loh AdrpAdd L1, L2 .loh AdrpAdd L1, L3 .loh AdrpAdd L1, L4 _foo: .long 0 _bar: .long 0 This is ld64's output: ld: warning: ignoring linker optimization hint at _main+0x0 because adrpInfoA.destReg == addInfoB.srcReg ld: warning: ignoring linker optimization hint at _main+0x0 because infoA.target == infoB.target ld: warning: ignoring linker optimization hint at _main+0x0 because isPageOffsetKind(infoB.fixup) Do you think we should add similar diagnostics?
lld/MachO/InputFiles.cpp
453	Comments are a good idea (especially since this feature is a bit obscure -- I personally didn't know about it before I looked through the GitHub issues). I'm going to add a general description to this function. Additionally, I'll explain each kind individually in the corresponding `performFoo` function.
492–493	I added it for forward compatibility. However, it looks like the LLVM pass that emits these hints hasn't been changed substantially for a long time, so it's probably safe to say that we won't be getting any new types that will be need to handled.

int3 added inline comments.Jun 23 2022, 8:15 AM

lld/MachO/Arch/ARM64.cpp
214–215	Thanks for checking! Do you think we should add similar diagnostics? Wouldn't hurt, but I wouldn't consider it high pri either We should add test cases that cover these code paths though
lld/MachO/InputFiles.cpp
492–493	Gotcha. I would also argue that we should expect people to upgrade LLD and clang in lockstep most of the time, so we don't need to worry too much about forwards compatibility

Added documentation comments
Removed unnecessary minOffset struct member. It was originally added to ensure that we can gracefully recover from (invalid) LOHs that span multiple InputSections. Let's add an explicit error instead.
Added failure cases to the AdrpAdd test, added test for AdrpAdrp
Made function/struct naming more consistent

Remove unnecessary DenseMap #include

Harbormaster completed remote builds in B171915: Diff 439848.Jun 24 2022, 1:13 PM

thakis added inline comments.Jun 25 2022, 5:24 PM

lld/MachO/InputFiles.cpp
492–493	+1
lld/MachO/Relocations.h
81	ld64 packs this into a single 64-bit word (cf `union LOH_arm64` in ld.hpp) and does a bunch of range checks to make sure that fits (cf `Section<arm64>::addLOH`, macho_relocatable_file.cpp). Could we do that too? Probably helps with perf (?)

Other than that, looks excellent to me.

lld/MachO/Arch/ARM64.cpp
232	`-210..210-1` is the range of a two's complement 21-bit immediate, which matches the bit pattern in writeAdr(). (Good; just explaining this code to myself.) Shouldn't -10241024 be inclusive on the lower end though? (i.e. change `<=` to `<` (?)) Also, I think some short comment (`// adr has a 21-bit two's complement immediate`) would maybe be nice. Very nit: Maybe `(1 << 20) is clearer than` 1024 1024` here? But up to you.
261	nit: I'd find `0xfff` (with `ULL` suffix as needed) clearer than 4095

int3 added inline comments.Jun 27 2022, 9:32 AM

lld/MachO/InputFiles.cpp
453	love the comments, thanks!
453	avoid using 'used for' twice in the same sentence
524	how about making `OptimizationHint::offsets` and `address` into `std::array<uint64_t, 3>`? Then we could just use the copy ctor here edit: just saw thakis' comment about LOH_arm64 below, I guess that would obviate the need for this
573	just in case the compiler doesn't hoist it out for us
lld/MachO/InputSection.cpp
221	I wonder if it would make the code a bit cleaner to just pass in `isec` to `applyOptimizationHints` instead of computing + storing `relocVA`. Would make it clearer that `applyOptimizationHints` is working on a per-subsection basis at least, plus allocate a bit less memory. (I suppose we could go one step further and extract `referentVA` from the buffer rather than storing it here, but that might not actually be more efficient...)
lld/test/MachO/loh-adrp-add.s
5	we should match against the warning messages too

We are down to 110 ms!

Benchmark 1: /Users/dani/Source/llvm-project/build_release/bin/ld64.lld @response.txt
  Time (mean ± σ):      3.828 s ±  0.021 s    [User: 5.182 s, System: 0.513 s]
  Range (min … max):    3.808 s …  3.877 s    10 runs
 
Benchmark 2: /Users/dani/Source/llvm-project/build_release/bin/ld64.lld @response.txt -ignore_optimization_hints
  Time (mean ± σ):      3.718 s ±  0.048 s    [User: 5.047 s, System: 0.496 s]
  Range (min … max):    3.672 s …  3.831 s    10 runs
 
Summary
  '/Users/dani/Source/llvm-project/build_release/bin/ld64.lld @response.txt -ignore_optimization_hints' ran
    1.03 ± 0.01 times faster than '/Users/dani/Source/llvm-project/build_release/bin/ld64.lld @response.txt'

Reduced the size of OptimizationHint to 16 bytes by storing only 16 bit offsets to the second and third addresses like ld64 does. The first address cannot be stored in a smaller integer because it's only transformed into a section offset after LOHs have been sorted.
Removed the redundant PerformedRelocation struct, now we only cache the resolved addresses.
Use O(reloc) linear search for the first address, so only the second addresses have to be searched in O(loh * log(reloc)) time.

BertalanD marked 15 inline comments as done.Jun 28 2022, 9:06 AM

BertalanD added inline comments.

lld/test/MachO/loh-adrp-add.s
5	I didn't end up adding warnings for the cases labeled "(invalid input)" here.

Harbormaster completed remote builds in B172496: Diff 440637.Jun 28 2022, 9:19 AM

just one question about the bounds check, otherwise this lgtm

lld/MachO/Arch/ARM64.cpp
171	imo `struct` should be reserved for POD types. Things with nontrivial methods should be `class` (with the public functions above the private data members) (alternatively you could keep this a `struct` and have the methods be free functions)
lld/MachO/InputFiles.cpp
526	is this supposed to be `int16_t`? The check as-is will never be false

Nice perf optimizations btw :)

int3 added inline comments.Jun 29 2022, 11:07 AM

lld/MachO/InputFiles.cpp
528–529	can we generate an object file that exercises this code path, or does `llvm-mc` refuse to produce one? (same question for the other error conditions)

Fixed bounds checking thinko
Added a test for too large offsets and LOHs spanning multiple sections. The other two failure cases can't be produced by the assembler as far as I can tell. Is the macro + single source file the nicest way to do it?
Turned OptimizationHintContext into a proper class.

int3 added inline comments.Jun 29 2022, 12:34 PM

lld/test/MachO/loh-invalid.s
1 ↗	(On Diff #441126)	can you put this file under `invalid/`? Is the macro + single source file the nicest way to do it? you can use `split-file` to embed multiple test input files into one source file

Moved the new test to invalid/ and changed it to use split-file instead of the nasty .ifdefs

lgtm

This revision is now accepted and ready to land.Jun 29 2022, 12:58 PM

Harbormaster completed remote builds in B172855: Diff 441137.Jun 29 2022, 2:59 PM

BertalanD mentioned this in rGa3f67f0920ea: [lld-macho] Initial support for Linker Optimization Hints.Jun 29 2022, 9:29 PM

BertalanD added a commit: rGa3f67f0920ea: [lld-macho] Initial support for Linker Optimization Hints.Jun 29 2022, 9:37 PM

Made a mistake in the commit message, so it wasn't auto-closed. It has been committed as a3f67f0920ea.

Revision Contents

Path

Size

lld/

MachO/

Arch/

203 lines

1 line

1 line

2 lines

154 lines

1 line

14 lines

3 lines

8 lines

6 lines

test/

MachO/

invalid/

invalid-loh.s

39 lines

loh-adrp-add.s

98 lines

loh-adrp-adrp.s

56 lines

llvm/

include/

llvm/

BinaryFormat/

MachO.h

11 lines

Diff 441137

lld/MachO/Arch/ARM64.cpp

Show All 30 Lines
struct ARM64 : ARM64Common {		struct ARM64 : ARM64Common {
ARM64();		ARM64();
void writeStub(uint8_t *buf, const Symbol &) const override;		void writeStub(uint8_t *buf, const Symbol &) const override;
void writeStubHelperHeader(uint8_t *buf) const override;		void writeStubHelperHeader(uint8_t *buf) const override;
void writeStubHelperEntry(uint8_t *buf, const Symbol &,		void writeStubHelperEntry(uint8_t *buf, const Symbol &,
uint64_t entryAddr) const override;		uint64_t entryAddr) const override;
const RelocAttrs &getRelocAttrs(uint8_t type) const override;		const RelocAttrs &getRelocAttrs(uint8_t type) const override;
void populateThunk(InputSection thunk, Symbol funcSym) override;		void populateThunk(InputSection thunk, Symbol funcSym) override;
		void applyOptimizationHints(uint8_t , const ConcatInputSection ,
		ArrayRef<uint64_t>) const override;
};		};

} // namespace		} // namespace

// Random notes on reloc types:		// Random notes on reloc types:
// ADDEND always pairs with BRANCH26, PAGE21, or PAGEOFF12		// ADDEND always pairs with BRANCH26, PAGE21, or PAGEOFF12
// POINTER_TO_GOT: ld64 supports a 4-byte pc-relative form as well as an 8-byte		// POINTER_TO_GOT: ld64 supports a 4-byte pc-relative form as well as an 8-byte
// absolute version of this relocation. The semantics of the absolute relocation		// absolute version of this relocation. The semantics of the absolute relocation
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	ARM64::ARM64() : ARM64Common(LP64()) {
modeDwarfEncoding = UNWIND_ARM64_MODE_DWARF;		modeDwarfEncoding = UNWIND_ARM64_MODE_DWARF;
subtractorRelocType = ARM64_RELOC_SUBTRACTOR;		subtractorRelocType = ARM64_RELOC_SUBTRACTOR;
unsignedRelocType = ARM64_RELOC_UNSIGNED;		unsignedRelocType = ARM64_RELOC_UNSIGNED;

stubHelperHeaderSize = sizeof(stubHelperHeaderCode);		stubHelperHeaderSize = sizeof(stubHelperHeaderCode);
stubHelperEntrySize = sizeof(stubHelperEntryCode);		stubHelperEntrySize = sizeof(stubHelperEntryCode);
}		}

		namespace {
		struct Adrp {
		uint32_t destRegister;
		};

		struct Add {
		uint8_t destRegister;
		uint8_t srcRegister;
		uint32_t addend;
		};

		struct PerformedReloc {
		const Reloc &rel;
		tschuettUnsubmitted Done Reply Inline Actions You could move the ADD into the anonymous namespace above. tschuett: You could move the ADD into the anonymous namespace above.
		int3Unsubmitted Done Reply Inline Actions nit: I would prefer `parseAdrp` (would also be more consistent with `applyAdrpAdrp` below) int3: nit: I would prefer `parseAdrp` (would also be more consistent with `applyAdrpAdrp` below)
		uint64_t referentVA;
		};

		class OptimizationHintContext {
		int3Unsubmitted Not Done Reply Inline Actions imo `struct` should be reserved for POD types. Things with nontrivial methods should be `class` (with the public functions above the private data members) (alternatively you could keep this a `struct` and have the methods be free functions) int3: imo `struct` should be reserved for POD types. Things with nontrivial methods should be `class`…
		public:
		OptimizationHintContext(uint8_t buf, const ConcatInputSection isec,
		ArrayRef<uint64_t> relocTargets)
		: buf(buf), isec(isec), relocTargets(relocTargets),
		relocIt(isec->relocs.rbegin()) {}

		void applyAdrpAdd(const OptimizationHint &);
		void applyAdrpAdrp(const OptimizationHint &);

		private:
		uint8_t *buf;
		const ConcatInputSection *isec;
		ArrayRef<uint64_t> relocTargets;
		int3Unsubmitted Done Reply Inline Actions `op` or `opcode` would be more accurate int3: `op` or `opcode` would be more accurate
		std::vector<Reloc>::const_reverse_iterator relocIt;

		uint64_t getRelocTarget(const Reloc &);

		Optional<PerformedReloc> findPrimaryReloc(uint64_t offset);
		Optional<PerformedReloc> findReloc(uint64_t offset);
		};
		} // namespace

		static bool parseAdrp(uint32_t insn, Adrp &adrp) {
		if ((insn & 0x9f000000) != 0x90000000)
		return false;
		adrp.destRegister = insn & 0x1f;
		return true;
		}

		static bool parseAdd(uint32_t insn, Add &add) {
		if ((insn & 0xffc00000) != 0x91000000)
		return false;
		add.destRegister = insn & 0x1f;
		add.srcRegister = (insn >> 5) & 0x1f;
		add.addend = (insn >> 10) & 0xfff;
		return true;
		}

		static void writeAdr(void *loc, uint32_t dest, int32_t delta) {
		uint32_t opcode = 0x10000000;
		uint32_t immHi = (delta & 0x001ffffc) << 3;
		uint32_t immLo = (delta & 0x00000003) << 29;
		write32le(loc, opcode \| immHi \| immLo \| dest);
		}
		int3Unsubmitted Not Done Reply Inline Actions do we ever expect an optimization hint to be emitted for instructions whose src/dest don't match? does ld64 perform this same validation? ditto for the referentVA check below int3: do we ever expect an optimization hint to be emitted for instructions whose src/dest don't…
		BertalanDAuthorUnsubmitted Done Reply Inline Actions ld64 checks these conditions and emits a warning if they don't hold. I assume it means that these are expected to be true for valid objects. I tested it with this file: .align 2 .globl _main _main: L1: adrp x0, _foo@PAGE L2: add x1, x1, _foo@PAGEOFF L3: add x0, x0, _bar@PAGEOFF L4: nop .loh AdrpAdd L1, L2 .loh AdrpAdd L1, L3 .loh AdrpAdd L1, L4 _foo: .long 0 _bar: .long 0 This is ld64's output: ld: warning: ignoring linker optimization hint at _main+0x0 because adrpInfoA.destReg == addInfoB.srcReg ld: warning: ignoring linker optimization hint at _main+0x0 because infoA.target == infoB.target ld: warning: ignoring linker optimization hint at _main+0x0 because isPageOffsetKind(infoB.fixup) Do you think we should add similar diagnostics? BertalanD: ld64 checks these conditions and emits a warning if they don't hold. I assume it means that…
		int3Unsubmitted Done Reply Inline Actions Thanks for checking! Do you think we should add similar diagnostics? Wouldn't hurt, but I wouldn't consider it high pri either We should add test cases that cover these code paths though int3: Thanks for checking! > Do you think we should add similar diagnostics? Wouldn't hurt, but I…

		static void writeNop(void *loc) { write32le(loc, 0xd503201f); }

		uint64_t OptimizationHintContext::getRelocTarget(const Reloc &reloc) {
		size_t relocIdx = &reloc - isec->relocs.data();
		return relocTargets[relocIdx];
		}

		// Optimization hints are sorted in a monotonically increasing order by their
		// first address as are relocations (albeit in decreasing order), so if we keep
		// a pointer around to the last found relocation, we don't have to do a full
		// binary search every time.
		Optional<PerformedReloc>
		OptimizationHintContext::findPrimaryReloc(uint64_t offset) {
		const auto end = isec->relocs.rend();
		while (relocIt != end && relocIt->offset < offset)
		++relocIt;
		thakisUnsubmitted Done Reply Inline Actions `-210..210-1` is the range of a two's complement 21-bit immediate, which matches the bit pattern in writeAdr(). (Good; just explaining this code to myself.) Shouldn't -10241024 be inclusive on the lower end though? (i.e. change `<=` to `<` (?)) Also, I think some short comment (`// adr has a 21-bit two's complement immediate`) would maybe be nice. Very nit: Maybe `(1 << 20) is clearer than` 1024 1024` here? But up to you. thakis: `-210..210-1` is the range of a two's complement 21-bit immediate, which matches the bit…
		if (relocIt == end \|\| relocIt->offset != offset)
		return None;
		return PerformedReloc{relocIt, getRelocTarget(relocIt)};
		}

		// The second and third addresses of optimization hints have no such
		tschuettUnsubmitted Done Reply Inline Actions Why are the static functions in an anonymous namespace? tschuett: Why are the static functions in an anonymous namespace?
		// monotonicity as the first, so we search the entire range of relocations.
		Optional<PerformedReloc> OptimizationHintContext::findReloc(uint64_t offset) {
		// Optimization hints often apply to successive relocations, so we check for
		// that first before doing a full binary search.
		auto end = isec->relocs.rend();
		if (relocIt < end - 1 && (relocIt + 1)->offset == offset)
		return PerformedReloc{(relocIt + 1), getRelocTarget((relocIt + 1))};

		auto reloc = lower_bound(isec->relocs, offset,
		[](const Reloc &reloc, uint64_t offset) {
		return offset < reloc.offset;
		});

		if (reloc == isec->relocs.end() \|\| reloc->offset != offset)
		return None;
		return PerformedReloc{reloc, getRelocTarget(reloc)};
		}

		// Transforms a pair of adrp+add instructions into an adr instruction if the
		// target is within the +/- 1 MiB range allowed by the adr's 21 bit signed
		// immediate offset.
		//
		// adrp xN, _foo@PAGE
		thakisUnsubmitted Done Reply Inline Actions nit: I'd find `0xfff` (with `ULL` suffix as needed) clearer than 4095 thakis: nit: I'd find `0xfff` (with `ULL` suffix as needed) clearer than 4095
		// add xM, xN, _foo@PAGEOFF
		// ->
		// adr xM, _foo
		// nop
		void OptimizationHintContext::applyAdrpAdd(const OptimizationHint &hint) {
		uint32_t ins1 = read32le(buf + hint.offset0);
		uint32_t ins2 = read32le(buf + hint.offset0 + hint.delta[0]);
		Adrp adrp;
		if (!parseAdrp(ins1, adrp))
		return;
		Add add;
		if (!parseAdd(ins2, add))
		return;
		if (adrp.destRegister != add.srcRegister)
		return;

		Optional<PerformedReloc> rel1 = findPrimaryReloc(hint.offset0);
		Optional<PerformedReloc> rel2 = findReloc(hint.offset0 + hint.delta[0]);
		if (!rel1 \|\| !rel2)
		return;
		if (rel1->referentVA != rel2->referentVA)
		return;
		int64_t delta = rel1->referentVA - rel1->rel.offset - isec->getVA();
		if (delta >= (1 << 20) \|\| delta < -(1 << 20))
		return;

		writeAdr(buf + hint.offset0, add.destRegister, delta);
		writeNop(buf + hint.offset0 + hint.delta[0]);
		}

		// Transforms two adrp instructions into a single adrp if their referent
		// addresses are located on the same 4096 byte page.
		//
		// adrp xN, _foo@PAGE
		// adrp xN, _bar@PAGE
		// ->
		// adrp xN, _foo@PAGE
		// nop
		void OptimizationHintContext::applyAdrpAdrp(const OptimizationHint &hint) {
		uint32_t ins1 = read32le(buf + hint.offset0);
		uint32_t ins2 = read32le(buf + hint.offset0 + hint.delta[0]);
		Adrp adrp1, adrp2;
		if (!parseAdrp(ins1, adrp1) \|\| !parseAdrp(ins2, adrp2))
		return;
		if (adrp1.destRegister != adrp2.destRegister)
		return;

		Optional<PerformedReloc> rel1 = findPrimaryReloc(hint.offset0);
		Optional<PerformedReloc> rel2 = findReloc(hint.offset0 + hint.delta[0]);
		if (!rel1 \|\| !rel2)
		return;
		if ((rel1->referentVA & ~0xfffULL) != (rel2->referentVA & ~0xfffULL))
		return;

		writeNop(buf + hint.offset0 + hint.delta[0]);
		}

		void ARM64::applyOptimizationHints(uint8_t buf, const ConcatInputSection isec,
		ArrayRef<uint64_t> relocTargets) const {
		assert(isec);
		assert(relocTargets.size() == isec->relocs.size());

		// Note: Some of these optimizations might not be valid when shared regions
		// are in use. Will need to revisit this if splitSegInfo is added.

		OptimizationHintContext ctx1(buf, isec, relocTargets);
		for (const OptimizationHint &hint : isec->optimizationHints) {
		switch (hint.type) {
		case LOH_ARM64_ADRP_ADRP:
		// This is done in another pass because the other optimization hints
		// might cause its targets to be turned into NOPs.
		break;
		case LOH_ARM64_ADRP_LDR:
		case LOH_ARM64_ADRP_ADD_LDR:
		case LOH_ARM64_ADRP_LDR_GOT_LDR:
		case LOH_ARM64_ADRP_ADD_STR:
		case LOH_ARM64_ADRP_LDR_GOT_STR:
		// TODO: Implement these
		break;
		case LOH_ARM64_ADRP_ADD:
		ctx1.applyAdrpAdd(hint);
		break;
		case LOH_ARM64_ADRP_LDR_GOT:
		// TODO: Implement this as well
		break;
		}
		}

		OptimizationHintContext ctx2(buf, isec, relocTargets);
		for (const OptimizationHint &hint : isec->optimizationHints)
		if (hint.type == LOH_ARM64_ADRP_ADRP)
		ctx2.applyAdrpAdrp(hint);
		}

TargetInfo *macho::createARM64TargetInfo() {		TargetInfo *macho::createARM64TargetInfo() {
static ARM64 t;		static ARM64 t;
return &t;		return &t;
}		}

lld/MachO/Config.h

Show First 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	struct Configuration {
bool emitBitcodeBundle = false;		bool emitBitcodeBundle = false;
bool emitDataInCodeInfo = false;		bool emitDataInCodeInfo = false;
bool emitEncryptionInfo = false;		bool emitEncryptionInfo = false;
bool timeTraceEnabled = false;		bool timeTraceEnabled = false;
bool dataConst = false;		bool dataConst = false;
bool dedupLiterals = true;		bool dedupLiterals = true;
bool omitDebugInfo = false;		bool omitDebugInfo = false;
bool warnDylibInstallName = false;		bool warnDylibInstallName = false;
		bool ignoreOptimizationHints = false;
// Temporary config flag that will be removed once we have fully implemented		// Temporary config flag that will be removed once we have fully implemented
// support for __eh_frame.		// support for __eh_frame.
bool parseEhFrames = false;		bool parseEhFrames = false;
uint32_t headerPad;		uint32_t headerPad;
uint32_t dylibCompatibilityVersion = 0;		uint32_t dylibCompatibilityVersion = 0;
uint32_t dylibCurrentVersion = 0;		uint32_t dylibCurrentVersion = 0;
uint32_t timeTraceGranularity = 500;		uint32_t timeTraceGranularity = 500;
unsigned optimize;		unsigned optimize;
▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

lld/MachO/Driver.cpp

Show First 20 Lines • Show All 1,295 Lines • ▼ Show 20 Lines	bool macho::link(ArrayRef<const char *> argsArr, llvm::raw_ostream &stdoutOS,
config->emitDataInCodeInfo =		config->emitDataInCodeInfo =
args.hasFlag(OPT_data_in_code_info, OPT_no_data_in_code_info, true);		args.hasFlag(OPT_data_in_code_info, OPT_no_data_in_code_info, true);
config->icfLevel = getICFLevel(args);		config->icfLevel = getICFLevel(args);
config->dedupLiterals =		config->dedupLiterals =
args.hasFlag(OPT_deduplicate_literals, OPT_icf_eq, false) \|\|		args.hasFlag(OPT_deduplicate_literals, OPT_icf_eq, false) \|\|
config->icfLevel != ICFLevel::none;		config->icfLevel != ICFLevel::none;
config->warnDylibInstallName = args.hasFlag(		config->warnDylibInstallName = args.hasFlag(
OPT_warn_dylib_install_name, OPT_no_warn_dylib_install_name, false);		OPT_warn_dylib_install_name, OPT_no_warn_dylib_install_name, false);
		config->ignoreOptimizationHints = args.hasArg(OPT_ignore_optimization_hints);
config->callGraphProfileSort = args.hasFlag(		config->callGraphProfileSort = args.hasFlag(
OPT_call_graph_profile_sort, OPT_no_call_graph_profile_sort, true);		OPT_call_graph_profile_sort, OPT_no_call_graph_profile_sort, true);
config->printSymbolOrder = args.getLastArgValue(OPT_print_symbol_order);		config->printSymbolOrder = args.getLastArgValue(OPT_print_symbol_order);
config->parseEhFrames = static_cast<bool>(getenv("LLD_IN_TEST"));		config->parseEhFrames = static_cast<bool>(getenv("LLD_IN_TEST"));

// FIXME: Add a commandline flag for this too.		// FIXME: Add a commandline flag for this too.
config->zeroModTime = getenv("ZERO_AR_DATE");		config->zeroModTime = getenv("ZERO_AR_DATE");

▲ Show 20 Lines • Show All 327 Lines • Show Last 20 Lines

lld/MachO/InputFiles.h

Show First 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	public:

llvm::DWARFUnit *compileUnit = nullptr;		llvm::DWARFUnit *compileUnit = nullptr;
std::unique_ptr<lld::DWARFCache> dwarfCache;		std::unique_ptr<lld::DWARFCache> dwarfCache;
Section *addrSigSection = nullptr;		Section *addrSigSection = nullptr;
const uint32_t modTime;		const uint32_t modTime;
std::vector<ConcatInputSection *> debugSections;		std::vector<ConcatInputSection *> debugSections;
std::vector<CallGraphEntry> callGraph;		std::vector<CallGraphEntry> callGraph;
llvm::DenseMap<ConcatInputSection *, FDE> fdes;		llvm::DenseMap<ConcatInputSection *, FDE> fdes;
		std::vector<OptimizationHint> optimizationHints;

private:		private:
llvm::once_flag initDwarf;		llvm::once_flag initDwarf;
template <class LP> void parseLazy();		template <class LP> void parseLazy();
template <class SectionHeader> void parseSections(ArrayRef<SectionHeader>);		template <class SectionHeader> void parseSections(ArrayRef<SectionHeader>);
template <class LP>		template <class LP>
void parseSymbols(ArrayRef<typename LP::section> sectionHeaders,		void parseSymbols(ArrayRef<typename LP::section> sectionHeaders,
ArrayRef<typename LP::nlist> nList, const char *strtab,		ArrayRef<typename LP::nlist> nList, const char *strtab,
bool subsectionsViaSymbols);		bool subsectionsViaSymbols);
template <class NList>		template <class NList>
Symbol *parseNonSectionSymbol(const NList &sym, StringRef name);		Symbol *parseNonSectionSymbol(const NList &sym, StringRef name);
template <class SectionHeader>		template <class SectionHeader>
void parseRelocations(ArrayRef<SectionHeader> sectionHeaders,		void parseRelocations(ArrayRef<SectionHeader> sectionHeaders,
const SectionHeader &, Section &);		const SectionHeader &, Section &);
void parseDebugInfo();		void parseDebugInfo();
		void parseOptimizationHints(ArrayRef<uint8_t> data);
void splitEhFrames(ArrayRef<uint8_t> dataArr, Section &ehFrameSection);		void splitEhFrames(ArrayRef<uint8_t> dataArr, Section &ehFrameSection);
void registerCompactUnwind(Section &compactUnwindSection);		void registerCompactUnwind(Section &compactUnwindSection);
void registerEhFrames(Section &ehFrameSection);		void registerEhFrames(Section &ehFrameSection);
};		};

// command-line -sectcreate file		// command-line -sectcreate file
class OpaqueFile final : public InputFile {		class OpaqueFile final : public InputFile {
public:		public:
▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

lld/MachO/InputFiles.cpp

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines

#include "lld/Common/CommonLinkerContext.h" #include "lld/Common/CommonLinkerContext.h"

#include "lld/Common/DWARF.h" #include "lld/Common/DWARF.h"

#include "lld/Common/Reproduce.h" #include "lld/Common/Reproduce.h"

#include "llvm/ADT/iterator.h" #include "llvm/ADT/iterator.h"

#include "llvm/BinaryFormat/MachO.h" #include "llvm/BinaryFormat/MachO.h"

#include "llvm/LTO/LTO.h" #include "llvm/LTO/LTO.h"

#include "llvm/Support/BinaryStreamReader.h" #include "llvm/Support/BinaryStreamReader.h"

#include "llvm/Support/Endian.h" #include "llvm/Support/Endian.h"

#include "llvm/Support/LEB128.h"

#include "llvm/Support/MemoryBuffer.h" #include "llvm/Support/MemoryBuffer.h"

#include "llvm/Support/Path.h" #include "llvm/Support/Path.h"

#include "llvm/Support/TarWriter.h" #include "llvm/Support/TarWriter.h"

#include "llvm/Support/TimeProfiler.h" #include "llvm/Support/TimeProfiler.h"

#include "llvm/TextAPI/Architecture.h" #include "llvm/TextAPI/Architecture.h"

#include "llvm/TextAPI/InterfaceFile.h" #include "llvm/TextAPI/InterfaceFile.h"

#include <type_traits> #include <type_traits>

▲ Show 20 Lines • Show All 368 Lines • ▼ Show 20 Lines static Defined *findSymbolAtOffset(const ConcatInputSection *isec,

// The offset should point at the exact address of a symbol (with no addend.) // The offset should point at the exact address of a symbol (with no addend.)

if (it == isec->symbols.end() || (*it)->value != off) { if (it == isec->symbols.end() || (*it)->value != off) {

assert(isec->wasCoalesced); assert(isec->wasCoalesced);

return nullptr; return nullptr;

} }

return *it; return *it;

} }

// Linker optimization hints mark a sequence of instructions used for

int3Unsubmitted

Done

can you include a comment block explaining what optimization hints are? something like what you put down in the commit message would be a good. Would also love an explanation of how minOffset is important.

we don't always have the most detailed comments, but if you check the top of InputFiles.cpp, there's a pretty detailed comment about the Mach-O format -- that's a good ideal to strive for :)

int3: can you include a comment block explaining what optimization hints are? something like what you…

BertalanDAuthorUnsubmitted

Done

Comments are a good idea (especially since this feature is a bit obscure -- I personally didn't know about it before I looked through the GitHub issues).

I'm going to add a general description to this function. Additionally, I'll explain each kind individually in the corresponding performFoo function.

BertalanD: Comments are a good idea (especially since this feature is a bit obscure -- I personally didn't…

int3Unsubmitted

Done

love the comments, thanks!

int3: love the comments, thanks!

int3Unsubmitted

Done

return *it;

}

- // Linker optimization hints are used for marking a sequence of instructions

+ // Linker optimization hints mark a sequence of instructions

// used for synthesizing an address which that be transformed into a faster

avoid using 'used for' twice in the same sentence

int3: avoid using 'used for' twice in the same sentence

// synthesizing an address which that be transformed into a faster sequence. The

// transformations depend on conditions that are determined at link time, like

// the distance to the referenced symbol or its alignment.

// Each hint has a type and refers to 2 or 3 instructions. Each of those

// instructions must have a corresponding relocation. After addresses have been

// finalized and relocations have been performed, we check if the requirements

// hold, and perform the optimizations if they do.

// Similar linker relaxations exist for ELF as well, with the difference being

// that the explicit marking allows for the relaxation of non-consecutive

// relocations too.

// The specific types of hints are documented in Arch/ARM64.cpp

void ObjFile::parseOptimizationHints(ArrayRef<uint8_t> data) {

auto expectedArgCount = [](uint8_t type) {

switch (type) {

case LOH_ARM64_ADRP_ADRP:

case LOH_ARM64_ADRP_LDR:

case LOH_ARM64_ADRP_ADD:

case LOH_ARM64_ADRP_LDR_GOT:

return 2;

case LOH_ARM64_ADRP_ADD_LDR:

case LOH_ARM64_ADRP_ADD_STR:

case LOH_ARM64_ADRP_LDR_GOT_LDR:

case LOH_ARM64_ADRP_LDR_GOT_STR:

return 3;

}

return -1;

};

// Each hint contains at least 4 ULEB128-encoded fields, so in the worst case,

// there are data.size() / 4 LOHs. It's a huge overestimation though, as

// offsets are unlikely to fall in the 0-127 byte range, so we pre-allocate

// half as much.

optimizationHints.reserve(data.size() / 8);

for (const uint8_t *p = data.begin(); p < data.end();) {

const ptrdiff_t inputOffset = p - data.begin();

unsigned int n = 0;

int3Unsubmitted

Done

do we actually expect this to happen in practice? can we just error out and return early?

int3: do we actually expect this to happen in practice? can we just error out and return early?

BertalanDAuthorUnsubmitted

Done

I added it for forward compatibility.

However, it looks like the LLVM pass that emits these hints hasn't been changed substantially for a long time, so it's probably safe to say that we won't be getting any new types that will be need to handled.

BertalanD: I added it for forward compatibility. However, it looks like the LLVM pass that emits these…

int3Unsubmitted

Done

Gotcha. I would also argue that we should expect people to upgrade LLD and clang in lockstep most of the time, so we don't need to worry too much about forwards compatibility

int3: Gotcha. I would also argue that we should expect people to upgrade LLD and clang in lockstep…

thakisUnsubmitted

Done

thakis: +1

uint8_t type = decodeULEB128(p, &n, data.end());

p += n;

// An entry of type 0 terminates the list.

if (type == 0)

break;

int expectedCount = expectedArgCount(type);

if (LLVM_UNLIKELY(expectedCount == -1)) {

error("Linker optimization hint at offset " + Twine(inputOffset) +

" has unknown type " + Twine(type));

return;

}

uint8_t argCount = decodeULEB128(p, &n, data.end());

p += n;

grandinjUnsubmitted

Done

I'm guessing if you run a profiler over this (like perf), you will see significant time spent in resizing/re-allocating the vector.
You could avoid this by calling reserve before the loop to reserve storage.

Alternatively, just use std::deque instead. It handles this case well, and doesn't need to re-allocate/resize.

grandinj: I'm guessing if you run a profiler over this (like perf), you will see significant time spent…

BertalanDAuthorUnsubmitted

Done

We need optimizationHints to be a contiguous block of memory if we want to store an ArrayRef<OptimizationHint> in each (sub)section. That way, it's faster than making many smaller vector<OptimizationHint> allocations for each InputSection.

The data is encoded in the variable length ULEB128 format, so we don't exactly know how many elements there are until we've parsed the section fully and encountered the terminating element. But we can make an (over)estimation.

BertalanD: We need `optimizationHints` to be a contiguous block of memory if we want to store an…

if (LLVM_UNLIKELY(argCount != expectedCount)) {

error("Linker optimization hint at offset " + Twine(inputOffset) +

" has " + Twine(argCount) + " arguments instead of the expected " +

Twine(expectedCount));

return;

}

uint64_t offset0 = decodeULEB128(p, &n, data.end());

p += n;

int16_t delta[2];

for (int i = 0; i < argCount - 1; ++i) {

uint64_t address = decodeULEB128(p, &n, data.end());

p += n;

int3Unsubmitted

Not Done

how about making OptimizationHint::offsets and address into std::array<uint64_t, 3>? Then we could just use the copy ctor here

edit: just saw thakis' comment about LOH_arm64 below, I guess that would obviate the need for this

int3: how about making `OptimizationHint::offsets` and `address` into `std::array<uint64_t, 3>`? Then…

int64_t d = address - offset0;

if (LLVM_UNLIKELY(d > std::numeric_limits<int16_t>::max() ||

int3Unsubmitted

Not Done

is this supposed to be int16_t? The check as-is will never be false

int3: is this supposed to be `int16_t`? The check as-is will never be false

d < std::numeric_limits<int16_t>::min())) {

error("Linker optimization hint at offset " + Twine(inputOffset) +

" has addresses too far apart");

int3Unsubmitted

Not Done

can we generate an object file that exercises this code path, or does llvm-mc refuse to produce one? (same question for the other error conditions)

int3: can we generate an object file that exercises this code path, or does `llvm-mc` refuse to…

return;

}

delta[i] = d;

}

optimizationHints.push_back({offset0, {delta[0], delta[1]}, type});

}

// We sort the per-object vector of optimization hints so each section only

// needs to hold an ArrayRef to a contiguous range of hints.

llvm::sort(optimizationHints,

[](const OptimizationHint &a, const OptimizationHint &b) {

return a.offset0 < b.offset0;

});

auto section = sections.begin();

auto subsection = (*section)->subsections.begin();

uint64_t subsectionBase = 0;

uint64_t subsectionEnd = 0;

auto updateAddr = [&]() {

subsectionBase = (*section)->addr + subsection->offset;

subsectionEnd = subsectionBase + subsection->isec->getSize();

};

auto advanceSubsection = [&]() {

if (section == sections.end())

return;

++subsection;

if (subsection == (*section)->subsections.end()) {

++section;

if (section == sections.end())

return;

subsection = (*section)->subsections.begin();

}

};

updateAddr();

auto hintStart = optimizationHints.begin();

for (auto hintEnd = hintStart, end = optimizationHints.end(); hintEnd != end;

++hintEnd) {

if (hintEnd->offset0 >= subsectionEnd) {

subsection->isec->optimizationHints =

ArrayRef<OptimizationHint>(&*hintStart, hintEnd - hintStart);

int3Unsubmitted

Done

}

- for (int i = 0; i < expectedArgCount(hintEnd->type); ++i) {

+ for (int i = 0, end = expectedArgCount(hintEnd->type); i < end; ++i) {

if (LLVM_UNLIKELY(hintEnd->offsets[i] < subsectionBase ||

just in case the compiler doesn't hoist it out for us

int3: just in case the compiler doesn't hoist it out for us

hintStart = hintEnd;

while (hintStart->offset0 >= subsectionEnd) {

advanceSubsection();

if (section == sections.end())

break;

updateAddr();

}

hintEnd->offset0 -= subsectionBase;

for (int i = 0, count = expectedArgCount(hintEnd->type); i < count - 1;

++i) {

if (LLVM_UNLIKELY(

hintEnd->delta[i] < -static_cast<int64_t>(hintEnd->offset0) ||

hintEnd->delta[i] >=

static_cast<int64_t>(subsectionEnd - hintEnd->offset0))) {

error("Linker optimization hint spans multiple sections");

return;

}

if (section != sections.end())

subsection->isec->optimizationHints = ArrayRef<OptimizationHint>(

&*hintStart, optimizationHints.end() - hintStart);

}

template <class SectionHeader> template <class SectionHeader>

static bool validateRelocationInfo(InputFile *file, const SectionHeader &sec, static bool validateRelocationInfo(InputFile *file, const SectionHeader &sec,

relocation_info rel) { relocation_info rel) {

const RelocAttrs &relocAttrs = target->getRelocAttrs(rel.r_type); const RelocAttrs &relocAttrs = target->getRelocAttrs(rel.r_type);

bool valid = true; bool valid = true;

auto message = [relocAttrs, file, sec, rel, &valid](const Twine &diagnostic) { auto message = [relocAttrs, file, sec, rel, &valid](const Twine &diagnostic) {

valid = false; valid = false;

return (relocAttrs.name + " relocation " + diagnostic + " at offset " + return (relocAttrs.name + " relocation " + diagnostic + " at offset " +

▲ Show 20 Lines • Show All 484 Lines • ▼ Show 20 Lines template <class LP> void ObjFile::parse() {

} }

// The relocations may refer to the symbols, so we parse them after we have // The relocations may refer to the symbols, so we parse them after we have

// parsed all the symbols. // parsed all the symbols.

for (size_t i = 0, n = sections.size(); i < n; ++i) for (size_t i = 0, n = sections.size(); i < n; ++i)

if (!sections[i]->subsections.empty()) if (!sections[i]->subsections.empty())

parseRelocations(sectionHeaders, sectionHeaders[i], *sections[i]); parseRelocations(sectionHeaders, sectionHeaders[i], *sections[i]);

if (!config->ignoreOptimizationHints)

if (auto *cmd = findCommand<linkedit_data_command>(

hdr, LC_LINKER_OPTIMIZATION_HINT))

parseOptimizationHints({buf + cmd->dataoff, cmd->datasize});

parseDebugInfo(); parseDebugInfo();

Section *ehFrameSection = nullptr; Section *ehFrameSection = nullptr;

Section *compactUnwindSection = nullptr; Section *compactUnwindSection = nullptr;

for (Section *sec : sections) { for (Section *sec : sections) {

Section **s = StringSwitch<Section **>(sec->name) Section **s = StringSwitch<Section **>(sec->name)

.Case(section_names::compactUnwind, &compactUnwindSection) .Case(section_names::compactUnwind, &compactUnwindSection)

.Case(section_names::ehFrame, &ehFrameSection) .Case(section_names::ehFrame, &ehFrameSection)

▲ Show 20 Lines • Show All 1,066 Lines • Show Last 20 Lines

lld/MachO/InputSection.h

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	public:
// keep the address of the symbol(s) in this section unique in the final		// keep the address of the symbol(s) in this section unique in the final
// binary ?		// binary ?
bool keepUnique = false;		bool keepUnique = false;
uint32_t align = 1;		uint32_t align = 1;

OutputSection *parent = nullptr;		OutputSection *parent = nullptr;
ArrayRef<uint8_t> data;		ArrayRef<uint8_t> data;
std::vector<Reloc> relocs;		std::vector<Reloc> relocs;
		ArrayRef<OptimizationHint> optimizationHints;
// The symbols that belong to this InputSection, sorted by value. With		// The symbols that belong to this InputSection, sorted by value. With
// .subsections_via_symbols, there is typically only one element here.		// .subsections_via_symbols, there is typically only one element here.
llvm::TinyPtrVector<Defined *> symbols;		llvm::TinyPtrVector<Defined *> symbols;

protected:		protected:
const Section &section;		const Section &section;

const Defined *getContainingSymbol(uint64_t off) const;		const Defined *getContainingSymbol(uint64_t off) const;
▲ Show 20 Lines • Show All 259 Lines • Show Last 20 Lines

lld/MachO/InputSection.cpp

Show All 23 Lines
using namespace llvm::MachO;		using namespace llvm::MachO;
using namespace llvm::support;		using namespace llvm::support;
using namespace lld;		using namespace lld;
using namespace lld::macho;		using namespace lld::macho;

// Verify ConcatInputSection's size on 64-bit builds. The size of std::vector		// Verify ConcatInputSection's size on 64-bit builds. The size of std::vector
// can differ based on STL debug levels (e.g. iterator debugging on MSVC's STL),		// can differ based on STL debug levels (e.g. iterator debugging on MSVC's STL),
// so account for that.		// so account for that.
static_assert(sizeof(void *) != 8 \|\|		static_assert(sizeof(void *) != 8 \|\| sizeof(ConcatInputSection) ==
sizeof(ConcatInputSection) == sizeof(std::vector<Reloc>) + 88,		sizeof(std::vector<Reloc>) + 104,
"Try to minimize ConcatInputSection's size, we create many "		"Try to minimize ConcatInputSection's size, we create many "
"instances of it");		"instances of it");

std::vector<ConcatInputSection *> macho::inputSections;		std::vector<ConcatInputSection *> macho::inputSections;

uint64_t InputSection::getFileSize() const {		uint64_t InputSection::getFileSize() const {
return isZeroFill(getFlags()) ? 0 : getSize();		return isZeroFill(getFlags()) ? 0 : getSize();
}		}
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines
void ConcatInputSection::writeTo(uint8_t *buf) {		void ConcatInputSection::writeTo(uint8_t *buf) {
assert(!shouldOmitFromOutput());		assert(!shouldOmitFromOutput());

if (getFileSize() == 0)		if (getFileSize() == 0)
return;		return;

memcpy(buf, data.data(), data.size());		memcpy(buf, data.data(), data.size());

		std::vector<uint64_t> relocTargets;
		if (!optimizationHints.empty())
		relocTargets.reserve(relocs.size());

for (size_t i = 0; i < relocs.size(); i++) {		for (size_t i = 0; i < relocs.size(); i++) {
const Reloc &r = relocs[i];		const Reloc &r = relocs[i];
uint8_t *loc = buf + r.offset;		uint8_t *loc = buf + r.offset;
uint64_t referentVA = 0;		uint64_t referentVA = 0;
if (target->hasAttr(r.type, RelocAttrBits::SUBTRAHEND)) {		if (target->hasAttr(r.type, RelocAttrBits::SUBTRAHEND)) {
const Symbol fromSym = r.referent.get<Symbol >();		const Symbol fromSym = r.referent.get<Symbol >();
const Reloc &minuend = relocs[++i];		const Reloc &minuend = relocs[++i];
uint64_t minuendVA;		uint64_t minuendVA;
Show All 19 Lines	if (target->hasAttr(r.type, RelocAttrBits::SUBTRAHEND)) {
if (isa<Defined>(referentSym))		if (isa<Defined>(referentSym))
referentVA -= firstTLVDataSection->addr;		referentVA -= firstTLVDataSection->addr;
}		}
} else if (auto referentIsec = r.referent.dyn_cast<InputSection >()) {		} else if (auto referentIsec = r.referent.dyn_cast<InputSection >()) {
assert(!::shouldOmitFromOutput(referentIsec));		assert(!::shouldOmitFromOutput(referentIsec));
referentVA = referentIsec->getVA(r.addend);		referentVA = referentIsec->getVA(r.addend);
}		}
target->relocateOne(loc, r, referentVA, getVA() + r.offset);		target->relocateOne(loc, r, referentVA, getVA() + r.offset);

		if (!optimizationHints.empty())
		relocTargets.push_back(referentVA);
		int3Unsubmitted Done Reply Inline Actions I wonder if it would make the code a bit cleaner to just pass in `isec` to `applyOptimizationHints` instead of computing + storing `relocVA`. Would make it clearer that `applyOptimizationHints` is working on a per-subsection basis at least, plus allocate a bit less memory. (I suppose we could go one step further and extract `referentVA` from the buffer rather than storing it here, but that might not actually be more efficient...) int3: I wonder if it would make the code a bit cleaner to just pass in `isec` to…
}		}

		if (!optimizationHints.empty())
		target->applyOptimizationHints(buf, this, relocTargets);
}		}

ConcatInputSection *macho::makeSyntheticInputSection(StringRef segName,		ConcatInputSection *macho::makeSyntheticInputSection(StringRef segName,
StringRef sectName,		StringRef sectName,
uint32_t flags,		uint32_t flags,
ArrayRef<uint8_t> data,		ArrayRef<uint8_t> data,
uint32_t align) {		uint32_t align) {
Section &section =		Section &section =
▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

lld/MachO/Options.td

Show First 20 Lines • Show All 1,251 Lines • ▼ Show 20 Lines	def i : Flag<["-"], "i">,
HelpText<"This option is undocumented in ld64">,		HelpText<"This option is undocumented in ld64">,
Flags<[HelpHidden]>,		Flags<[HelpHidden]>,
Group<grp_undocumented>;		Group<grp_undocumented>;
def ignore_auto_link : Flag<["-"], "ignore_auto_link">,		def ignore_auto_link : Flag<["-"], "ignore_auto_link">,
HelpText<"This option is undocumented in ld64">,		HelpText<"This option is undocumented in ld64">,
Flags<[HelpHidden]>,		Flags<[HelpHidden]>,
Group<grp_undocumented>;		Group<grp_undocumented>;
def ignore_optimization_hints : Flag<["-"], "ignore_optimization_hints">,		def ignore_optimization_hints : Flag<["-"], "ignore_optimization_hints">,
HelpText<"This option is undocumented in ld64">,		HelpText<"Ignore Linker Optimization Hints">,
Flags<[HelpHidden]>,
Group<grp_undocumented>;		Group<grp_undocumented>;
def init_offsets : Flag<["-"], "init_offsets">,		def init_offsets : Flag<["-"], "init_offsets">,
HelpText<"This option is undocumented in ld64">,		HelpText<"This option is undocumented in ld64">,
Flags<[HelpHidden]>,		Flags<[HelpHidden]>,
Group<grp_undocumented>;		Group<grp_undocumented>;
def keep_dwarf_unwind : Flag<["-"], "keep_dwarf_unwind">,		def keep_dwarf_unwind : Flag<["-"], "keep_dwarf_unwind">,
HelpText<"This option is undocumented in ld64">,		HelpText<"This option is undocumented in ld64">,
Flags<[HelpHidden]>,		Flags<[HelpHidden]>,
▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

lld/MachO/Relocations.h

Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	struct Reloc {
Reloc() = default;		Reloc() = default;

Reloc(uint8_t type, bool pcrel, uint8_t length, uint32_t offset,		Reloc(uint8_t type, bool pcrel, uint8_t length, uint32_t offset,
int64_t addend, llvm::PointerUnion<Symbol , InputSection > referent)		int64_t addend, llvm::PointerUnion<Symbol , InputSection > referent)
: type(type), pcrel(pcrel), length(length), offset(offset),		: type(type), pcrel(pcrel), length(length), offset(offset),
addend(addend), referent(referent) {}		addend(addend), referent(referent) {}
};		};

		struct OptimizationHint {
		// Offset of the first address within the containing InputSection.
		uint64_t offset0;
		// Offset of the other addresses relative to the first one.
		int16_t delta[2];
		uint8_t type;
		};

bool validateSymbolRelocation(const Symbol , const InputSection ,		bool validateSymbolRelocation(const Symbol , const InputSection ,
		thakisUnsubmitted Done Reply Inline Actions ld64 packs this into a single 64-bit word (cf `union LOH_arm64` in ld.hpp) and does a bunch of range checks to make sure that fits (cf `Section<arm64>::addLOH`, macho_relocatable_file.cpp). Could we do that too? Probably helps with perf (?) thakis: ld64 packs this into a single 64-bit word (cf `union LOH_arm64` in ld.hpp) and does a bunch of…
const Reloc &);		const Reloc &);

/*		/*
		grandinjUnsubmitted Done Reply Inline Actions () offsets[2] is never used, so it looks like this array can drop the 3rd element. () I suspect it may in fact be cheaper to drop the minOffset field, and make it be a member function that calculates the minimum each time, since the code that touches this is likely to be dominated by cache-hit ratio, and making this structure smaller will increase that ratio, but you'd have to benchmark that. () which brings me to - are these offsets completely independent of each other, or could they be expressed, for example, as uint64_t offset_base uint32_t offset_offsets[2]; // add this to offset_base to get the actual offset possibly? Which would further increase cache density. grandinj:* () offsets[2] is never used, so it looks like this array can drop the 3rd element. () I…
* v: The value the relocation is attempting to encode		* v: The value the relocation is attempting to encode
* bits: The number of bits actually available to encode this relocation		* bits: The number of bits actually available to encode this relocation
*/		*/
void reportRangeError(void *loc, const Reloc &, const llvm::Twine &v,		void reportRangeError(void *loc, const Reloc &, const llvm::Twine &v,
uint8_t bits, int64_t min, uint64_t max);		uint8_t bits, int64_t min, uint64_t max);

struct SymbolDiagnostic {		struct SymbolDiagnostic {
const Symbol *symbol;		const Symbol *symbol;
Show All 38 Lines

lld/MachO/Target.h

Show All 21 Lines

namespace lld {		namespace lld {
namespace macho {		namespace macho {
LLVM_ENABLE_BITMASK_ENUMS_IN_NAMESPACE();		LLVM_ENABLE_BITMASK_ENUMS_IN_NAMESPACE();

class Symbol;		class Symbol;
class Defined;		class Defined;
class DylibSymbol;		class DylibSymbol;
class InputSection;		class ConcatInputSection;
		struct OptimizationHint;

class TargetInfo {		class TargetInfo {
public:		public:
template <class LP> TargetInfo(LP) {		template <class LP> TargetInfo(LP) {
// Having these values available in TargetInfo allows us to access them		// Having these values available in TargetInfo allows us to access them
// without having to resort to templates.		// without having to resort to templates.
magic = LP::magic;		magic = LP::magic;
pageZeroSize = LP::pageZeroSize;		pageZeroSize = LP::pageZeroSize;
Show All 34 Lines	public:
}		}

bool hasAttr(uint8_t type, RelocAttrBits bit) const {		bool hasAttr(uint8_t type, RelocAttrBits bit) const {
return getRelocAttrs(type).hasAttr(bit);		return getRelocAttrs(type).hasAttr(bit);
}		}

bool usesThunks() const { return thunkSize > 0; }		bool usesThunks() const { return thunkSize > 0; }

		virtual void applyOptimizationHints(uint8_t buf, const ConcatInputSection ,
		llvm::ArrayRef<uint64_t>) const {};

uint32_t magic;		uint32_t magic;
llvm::MachO::CPUType cpuType;		llvm::MachO::CPUType cpuType;
uint32_t cpuSubtype;		uint32_t cpuSubtype;

uint64_t pageZeroSize;		uint64_t pageZeroSize;
size_t headerSize;		size_t headerSize;
size_t stubSize;		size_t stubSize;
size_t stubHelperHeaderSize;		size_t stubHelperHeaderSize;
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

lld/test/MachO/invalid/invalid-loh.s

This file was added.

				# REQUIRES: aarch64

				# RUN: rm -rf %t; split-file %s %t
				# RUN: llvm-mc -filetype=obj -triple=arm64-apple-darwin %t/section.s -o %t/section.o
				# RUN: llvm-mc -filetype=obj -triple=arm64-apple-darwin %t/far.s -o %t/far.o
				# RUN: not %lld -arch arm64 %t/section.o -o /dev/null 2>&1 \| FileCheck %s --check-prefix=SECTION
				# RUN: not %lld -arch arm64 %t/far.o -o /dev/null 2>&1 \| FileCheck %s --check-prefix=FAR

				# SECTION: error: Linker optimization hint spans multiple sections
				# FAR: error: Linker optimization hint at offset 0 has addresses too far apart

				#--- section.s
				.globl _main
				_main:
				L1:
				adrp x0, _target@PAGE

				_foo:
				L2:
				add x0, x0, _target@PAGEOFF

				_target:

				.loh AdrpAdd L1, L2
				.subsections_via_symbols

				#--- far.s
				.globl _main
				_main:
				L1:
				adrp x0, _target@PAGE
				.zero 0x8000
				L2:
				add x0, x0, _target@PAGEOFF

				_target:

				.loh AdrpAdd L1, L2
				.subsections_via_symbols

lld/test/MachO/loh-adrp-add.s

This file was added.

				# REQUIRES: aarch64

				# RUN: llvm-mc -filetype=obj -triple=arm64-apple-darwin %s -o %t.o
				# RUN: %lld -arch arm64 %t.o -o %t
				# RUN: llvm-objdump -d --macho %t \| FileCheck %s
				int3Unsubmitted Not Done Reply Inline Actions we should match against the warning messages too int3: we should match against the warning messages too
				BertalanDAuthorUnsubmitted Done Reply Inline Actions I didn't end up adding warnings for the cases labeled "(invalid input)" here. BertalanD: I didn't end up adding warnings for the cases labeled "(invalid input)" here.

				# CHECK-LABEL: _main:
				## Out of range, before
				# CHECK-NEXT: adrp x0
				# CHECK-NEXT: add x0, x0
				## In range, before
				# CHECK-NEXT: adr x1
				# CHECK-NEXT: nop
				## Registers don't match (invalid input)
				# CHECK-NEXT: adrp x2
				# CHECK-NEXT: add x0
				## Targets don't match (invalid input)
				# CHECK-NEXT: adrp x3
				# CHECK-NEXT: add x3
				## Not an adrp instruction (invalid input)
				# CHECK-NEXT: nop
				# CHECK-NEXT: add x4
				## In range, after
				# CHECK-NEXT: adr x5
				# CHECK-NEXT: nop
				## In range, add's destination register is not the same as its source
				# CHECK-NEXT: adr x7
				# CHECK-NEXT: nop
				## Valid, non-adjacent instructions - start
				# CHECK-NEXT: adr x8
				## Out of range, after
				# CHECK-NEXT: adrp x9
				# CHECK-NEXT: add x9, x9
				## Valid, non-adjacent instructions - end
				# CHECK-NEXT: nop

				.text
				.align 2
				_before_far:
				.space 1048576

				_before_near:
				nop

				.globl _main
				_main:
				L1:
				adrp x0, _before_far@PAGE
				L2:
				add x0, x0, _before_far@PAGEOFF
				L3:
				adrp x1, _before_near@PAGE
				L4:
				add x1, x1, _before_near@PAGEOFF
				L5:
				adrp x2, _before_near@PAGE
				L6:
				add x0, x0, _before_near@PAGEOFF
				L7:
				adrp x3, _before_near@PAGE
				L8:
				add x3, x3, _after_near@PAGEOFF
				L9:
				nop
				L10:
				add x4, x4, _after_near@PAGEOFF
				L11:
				adrp x5, _after_near@PAGE
				L12:
				add x5, x5, _after_near@PAGEOFF
				L13:
				adrp x6, _after_near@PAGE
				L14:
				add x7, x6, _after_near@PAGEOFF
				L15:
				adrp x8, _after_near@PAGE
				L16:
				adrp x9, _after_far@PAGE
				L17:
				add x9, x9, _after_far@PAGEOFF
				L18:
				add x8, x8, _after_near@PAGEOFF

				_after_near:
				.space 1048576

				_after_far:
				nop

				.loh AdrpAdd L1, L2
				.loh AdrpAdd L3, L4
				.loh AdrpAdd L5, L6
				.loh AdrpAdd L7, L8
				.loh AdrpAdd L9, L10
				.loh AdrpAdd L11, L12
				.loh AdrpAdd L13, L14
				.loh AdrpAdd L15, L18
				.loh AdrpAdd L16, L17

lld/test/MachO/loh-adrp-adrp.s

This file was added.

				# REQUIRES: aarch64

				# RUN: llvm-mc -filetype=obj -triple=arm64-apple-darwin %s -o %t.o
				# RUN: %lld -arch arm64 %t.o -o %t
				# RUN: llvm-objdump -d --macho %t \| FileCheck %s

				# CHECK-LABEL: _main:
				## Valid
				# CHECK-NEXT: adrp x0
				# CHECK-NEXT: nop
				## Mismatched registers
				# CHECK-NEXT: adrp x1
				# CHECK-NEXT: adrp x2
				## Not on the same page
				# CHECK-NEXT: adrp x3
				# CHECK-NEXT: adrp x3
				## Not an adrp instruction (invalid)
				# CHECK-NEXT: nop
				# CHECK-NEXT: adrp x4

				.text
				.align 2

				.globl _main
				_main:
				L1:
				adrp x0, _foo@PAGE
				L2:
				adrp x0, _bar@PAGE
				L3:
				adrp x1, _foo@PAGE
				L4:
				adrp x2, _bar@PAGE
				L5:
				adrp x3, _foo@PAGE
				L6:
				adrp x3, _baz@PAGE
				L7:
				nop
				L8:
				adrp x4, _baz@PAGE

				.data
				.align 12
				_foo:
				.byte 0
				_bar:
				.byte 0
				.space 4094
				_baz:
				.byte 0

				.loh AdrpAdrp L1, L2
				.loh AdrpAdrp L3, L4
				.loh AdrpAdrp L5, L6
				.loh AdrpAdrp L7, L8

llvm/include/llvm/BinaryFormat/MachO.h

Show First 20 Lines • Show All 2,231 Lines • ▼ Show 20 Lines	enum SecCSDigestAlgorithm {
kSecCodeSignatureHashSHA1 = 1, /* SHA-1 */		kSecCodeSignatureHashSHA1 = 1, /* SHA-1 */
kSecCodeSignatureHashSHA256 = 2, /* SHA-256 */		kSecCodeSignatureHashSHA256 = 2, /* SHA-256 */
kSecCodeSignatureHashSHA256Truncated =		kSecCodeSignatureHashSHA256Truncated =
3, /* SHA-256 truncated to first 20 bytes */		3, /* SHA-256 truncated to first 20 bytes */
kSecCodeSignatureHashSHA384 = 4, /* SHA-384 */		kSecCodeSignatureHashSHA384 = 4, /* SHA-384 */
kSecCodeSignatureHashSHA512 = 5, /* SHA-512 */		kSecCodeSignatureHashSHA512 = 5, /* SHA-512 */
};		};

		enum LinkerOptimizationHintKind {
		LOH_ARM64_ADRP_ADRP = 1,
		LOH_ARM64_ADRP_LDR = 2,
		LOH_ARM64_ADRP_ADD_LDR = 3,
		LOH_ARM64_ADRP_LDR_GOT_LDR = 4,
		LOH_ARM64_ADRP_ADD_STR = 5,
		LOH_ARM64_ADRP_LDR_GOT_STR = 6,
		LOH_ARM64_ADRP_ADD = 7,
		LOH_ARM64_ADRP_LDR_GOT = 8,
		};

} // end namespace MachO		} // end namespace MachO
} // end namespace llvm		} // end namespace llvm

#endif		#endif

This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Initial support for Linker Optimization HintsClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 441137

lld/MachO/Arch/ARM64.cpp

lld/MachO/Config.h

lld/MachO/Driver.cpp

lld/MachO/InputFiles.h

lld/MachO/InputFiles.cpp

lld/MachO/InputSection.h

lld/MachO/InputSection.cpp

lld/MachO/Options.td

lld/MachO/Relocations.h

lld/MachO/Target.h

lld/test/MachO/invalid/invalid-loh.s

lld/test/MachO/loh-adrp-add.s

lld/test/MachO/loh-adrp-adrp.s

llvm/include/llvm/BinaryFormat/MachO.h

[lld-macho] Initial support for Linker Optimization Hints
ClosedPublic