This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/
-
MachO/
3/3
SyntheticSections.h
5/5
SyntheticSections.cpp
-
ld64-vs-lld.rst
-
test/MachO/
-
MachO/
-
cstring-align.s
-
cstring-dedup.s

Differential D121342

[lld-macho] Align cstrings less conservatively
ClosedPublic

Authored by int3 on Mar 9 2022, 2:37 PM.

Download Raw Diff

Details

Reviewers

oontvoo

Group Reviewers

Restricted Project

Commits

rG4308f031cd0c: [lld-macho] Align cstrings less conservatively

Summary

Previously, we aligned every cstring to 16 bytes as a temporary hack to
deal with https://github.com/llvm/llvm-project/issues/50135. However, it
was highly wasteful in terms of binary size.

To recap, in contrast to ELF, which puts strings that need different
alignments into different sections, clang's Mach-O backend puts them
all in one section. Strings that need to be aligned have the .p2align
directive emitted before them, which simply translates into zero padding
in the object file. In other words, we have to infer the alignment of
the cstrings from their addresses.

We differ slightly from ld64 in how we've chosen to align these
cstrings. Both LLD and ld64 preserve the number of trailing zeros in
each cstring's address in the input object files. When deduplicating
identical cstrings, both linkers pick the cstring whose address has more
trailing zeros, and preserve the alignment of that address in the final
binary. However, ld64 goes a step further and also preserves the offset
of the cstring from the last section-aligned address. I.e. if a cstring
is at offset 18 in the input, with a section alignment of 16, then both
LLD and ld64 will ensure the final address is 2-byte aligned (since
18 == 16 + 2). But ld64 will also ensure that the final address is of
the form 16 * k + 2 for some k (which implies 2-byte alignment).

Note that ld64's heuristic means that a dedup'ed cstring's final address is
dependent on the order of the input object files. E.g. if in addition to the
cstring at offset 18 above, we have a duplicate one in another file with a
.cstring section alignment of 2 and an offset of zero, then ld64 will pick
the cstring from the object file earlier on the command line (since both have
the same number of trailing zeros in their address). So the final cstring may
either be at some address 16 * k + 2 or at some address 2 * k.

I've opted not to follow this behavior primarily for implementation
simplicity, and secondarily to save a few more bytes. It's not clear to me
that preserving the section alignment + offset is ever necessary, and there
are many cases that are clearly redundant. In particular, if an x86_64 object
file contains some strings that are accessed via SIMD instructions, then the
.cstring section in the object file will be 16-byte-aligned (since SIMD
requires its operand addresses to be 16-byte aligned). However, there will
typically also be other cstrings in the same file that aren't used via SIMD
and don't need this alignment. They will be emitted at some arbitrary address
A, but ld64 will treat them as being 16-byte aligned with an offset of
16 % A.

I have verified that the two repros in https://github.com/llvm/llvm-project/issues/50135
work well with the new alignment behavior.

Fixes https://github.com/llvm/llvm-project/issues/54036.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

int3 created this revision.Mar 9 2022, 2:37 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 9 2022, 2:37 PM

Herald added a subscriber: pengfei. · View Herald Transcript

int3 requested review of this revision.Mar 9 2022, 2:37 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 9 2022, 2:37 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B153442: Diff 414218.Mar 9 2022, 2:48 PM

int3 edited the summary of this revision. (Show Details)Mar 9 2022, 2:56 PM

int3 edited the summary of this revision. (Show Details)

oontvoo added a subscriber: oontvoo.Mar 9 2022, 6:34 PM

oontvoo added inline comments.

lld/MachO/SyntheticSections.cpp
1366–1376	should we note soemthing about this in the lld-vs-ld64.rst doc? (Not this implementation detail particularly but rather the possible observable difference in the final binaries)
1419–1429	how's this? just to make it a bit clearer you're updating the entry - otherwise it's a bit easy to miss at first glance. (related nit: could have a static constant (eg., TombStone) so we dont have to remember using UINT64_MAX)
lld/MachO/SyntheticSections.h
534	just delete the whole decl? (default ctor is implicitly provided already)
541	(consistency with "isec")

int3 edited the summary of this revision. (Show Details)Mar 9 2022, 7:35 PM

int3 marked 4 inline comments as done.Mar 9 2022, 7:58 PM

int3 added inline comments.

lld/MachO/SyntheticSections.cpp
1366–1376	Done (but with a rather brief note). I don't think it's worth diving into the specifics of e.g. `16 * k + 2` (maybe that's what you meant by 'implementation detail'), that would just confuse most end users... but yeah, on the off chance that some build depends on this exact behavior, it could help someone triage the problem. Thanks!
1419–1429	excellent suggestion, it's clearer this way :) re static constant, we already use `UINT*_MAX` as a sentinel value in a lot of other places in the code (both in LLD/ELF and LLD/MachO), so I think it's a pretty common pattern overall and not likely to cause confusion
lld/MachO/SyntheticSections.h
541	`outSecOff` is already used in a lot of other places though, so I'm just following suit here. Arguably it's inconsistent with `isec`, but the counterargument is that member names should be more explicit/verbose than local variable names. FWIW, LLD/ELF uses `outSecOff` together with `isec` as well.

address comments

lld/MachO/SyntheticSections.cpp
1377–1396	I've added this paragraph to further make the case that it's unlikely for any build to depend on this specific behavior.

Harbormaster completed remote builds in B153478: Diff 414266.Mar 9 2022, 8:23 PM

int3 edited the summary of this revision. (Show Details)Mar 9 2022, 9:02 PM

LG thanks!

This revision is now accepted and ready to land.Mar 10 2022, 7:33 AM

Closed by commit rG4308f031cd0c: [lld-macho] Align cstrings less conservatively (authored by int3). · Explain WhyMar 10 2022, 12:18 PM

This revision was automatically updated to reflect the committed changes.

int3 added a commit: rG4308f031cd0c: [lld-macho] Align cstrings less conservatively.

Revision Contents

Path

Size

lld/

MachO/

SyntheticSections.h

14 lines

SyntheticSections.cpp

105 lines

ld64-vs-lld.rst

6 lines

test/

MachO/

cstring-align.s

133 lines

cstring-dedup.s

9 lines

Diff 414456

lld/MachO/SyntheticSections.h

Show First 20 Lines • Show All 525 Lines • ▼ Show 20 Lines public:

std::vector<CStringInputSection *> inputs; std::vector<CStringInputSection *> inputs;

private: private:

uint64_t size; uint64_t size;

}; };

class DeduplicatedCStringSection final : public CStringSection { class DeduplicatedCStringSection final : public CStringSection {

public: public:

DeduplicatedCStringSection(); uint64_t getSize() const override { return size; }

oontvooUnsubmitted

Done

just delete the whole decl? (default ctor is implicitly provided already)

oontvoo: just delete the whole decl? (default ctor is implicitly provided already)

uint64_t getSize() const override { return builder.getSize(); }

void finalizeContents() override; void finalizeContents() override;

void writeTo(uint8_t *buf) const override { builder.write(buf); } void writeTo(uint8_t *buf) const override;

private: private:

llvm::StringTableBuilder builder; struct StringOffset {

uint8_t trailingZeros;

uint64_t outSecOff = UINT64_MAX;

oontvooUnsubmitted

Done

uint8_t trailingZeros;

- uint64_t outSecOff = UINT64_MAX;

+ uint64_t osecOff = UINT64_MAX;

explicit StringOffset(uint8_t zeros) : trailingZeros(zeros) {}

(consistency with "isec")

oontvoo: (consistency with "isec")

int3AuthorUnsubmitted

Done

outSecOff is already used in a lot of other places though, so I'm just following suit here. Arguably it's inconsistent with isec, but the counterargument is that member names should be more explicit/verbose than local variable names.

FWIW, LLD/ELF uses outSecOff together with isec as well.

int3: `outSecOff` is already used in a lot of other places though, so I'm just following suit here.

explicit StringOffset(uint8_t zeros) : trailingZeros(zeros) {}

};

llvm::DenseMap<llvm::CachedHashStringRef, StringOffset> stringOffsetMap;

size_t size = 0;

}; };

/* /*

* This section contains deduplicated literal values. The 16-byte values are * This section contains deduplicated literal values. The 16-byte values are

* laid out first, followed by the 8- and then the 4-byte ones. * laid out first, followed by the 8- and then the 4-byte ones.

*/ */

class WordLiteralSection final : public SyntheticSection { class WordLiteralSection final : public SyntheticSection {

public: public:

▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

lld/MachO/SyntheticSections.cpp

Show First 20 Lines • Show All 1,335 Lines • ▼ Show 20 Lines

} }

void CStringSection::finalizeContents() { void CStringSection::finalizeContents() {

uint64_t offset = 0; uint64_t offset = 0;

for (CStringInputSection *isec : inputs) { for (CStringInputSection *isec : inputs) {

for (size_t i = 0, e = isec->pieces.size(); i != e; ++i) { for (size_t i = 0, e = isec->pieces.size(); i != e; ++i) {

if (!isec->pieces[i].live) if (!isec->pieces[i].live)

continue; continue;

uint32_t pieceAlign = MinAlign(isec->pieces[i].inSecOff, align); // See comment above DeduplicatedCStringSection for how alignment is

// handled.

uint32_t pieceAlign =

1 << countTrailingZeros(isec->align | isec->pieces[i].inSecOff);

offset = alignTo(offset, pieceAlign); offset = alignTo(offset, pieceAlign);

isec->pieces[i].outSecOff = offset; isec->pieces[i].outSecOff = offset;

isec->isFinal = true; isec->isFinal = true;

StringRef string = isec->getStringRef(i); StringRef string = isec->getStringRef(i);

offset += string.size(); offset += string.size();

} }

size = offset; size = offset;

} }

// Mergeable cstring literals are found under the __TEXT,__cstring section. In // Mergeable cstring literals are found under the __TEXT,__cstring section. In

// contrast to ELF, which puts strings that need different alignments into // contrast to ELF, which puts strings that need different alignments into

// different sections, clang's Mach-O backend puts them all in one section. // different sections, clang's Mach-O backend puts them all in one section.

// Strings that need to be aligned have the .p2align directive emitted before // Strings that need to be aligned have the .p2align directive emitted before

// them, which simply translates into zero padding in the object file. // them, which simply translates into zero padding in the object file. In other

// words, we have to infer the desired alignment of these cstrings from their

// addresses.

// //

// I *think* ld64 extracts the desired per-string alignment from this data by // We differ slightly from ld64 in how we've chosen to align these cstrings.

// preserving each string's offset from the last section-aligned address. I'm // Both LLD and ld64 preserve the number of trailing zeros in each cstring's

// not entirely certain since it doesn't seem consistent about doing this, and // address in the input object files. When deduplicating identical cstrings,

// in fact doesn't seem to be correct in general: we can in fact can induce ld64 // both linkers pick the cstring whose address has more trailing zeros, and

// to produce a crashing binary just by linking in an additional object file // preserve the alignment of that address in the final binary. However, ld64

// that only contains a duplicate cstring at a different alignment. See PR50563 // goes a step further and also preserves the offset of the cstring from the

// for details. // last section-aligned address. I.e. if a cstring is at offset 18 in the

// input, with a section alignment of 16, then both LLD and ld64 will ensure the

// final address is 2-byte aligned (since 18 == 16 + 2). But ld64 will also

// ensure that the final address is of the form 16 * k + 2 for some k.

// //

oontvooUnsubmitted

Done

should we note soemthing about this in the lld-vs-ld64.rst doc?
(Not this implementation detail particularly but rather the possible observable difference in the final binaries)

oontvoo: should we note soemthing about this in the lld-vs-ld64.rst doc? (Not this implementation detail…

int3AuthorUnsubmitted

Done

Done (but with a rather brief note). I don't think it's worth diving into the specifics of e.g. 16 * k + 2 (maybe that's what you meant by 'implementation detail'), that would just confuse most end users... but yeah, on the off chance that some build depends on this exact behavior, it could help someone triage the problem. Thanks!

int3: Done (but with a rather brief note). I don't think it's worth diving into the specifics of e.g.

// On x86_64, the cstrings we've seen so far that require special alignment are // Note that ld64's heuristic means that a dedup'ed cstring's final address is

// all accessed by SIMD operations -- x86_64 requires SIMD accesses to be // dependent on the order of the input object files. E.g. if in addition to the

// 16-byte-aligned. arm64 also seems to require 16-byte-alignment in some cases // cstring at offset 18 above, we have a duplicate one in another file with a

// (PR50791), but I haven't tracked down the root cause. So for now, I'm just // `.cstring` section alignment of 2 and an offset of zero, then ld64 will pick

// aligning all strings to 16 bytes. This is indeed wasteful, but // the cstring from the object file earlier on the command line (since both have

// implementation-wise it's simpler than preserving per-string // the same number of trailing zeros in their address). So the final cstring may

// alignment+offsets. It also avoids the aforementioned crash after // either be at some address `16 * k + 2` or at some address `2 * k`.

// deduplication of differently-aligned strings. Finally, the overhead is not //

// huge: using 16-byte alignment (vs no alignment) is only a 0.5% size overhead // I've opted not to follow this behavior primarily for implementation

// when linking chromium_framework on x86_64. // simplicity, and secondarily to save a few more bytes. It's not clear to me

DeduplicatedCStringSection::DeduplicatedCStringSection() // that preserving the section alignment + offset is ever necessary, and there

: builder(StringTableBuilder::RAW, /*Alignment=*/16) {} // are many cases that are clearly redundant. In particular, if an x86_64 object

// file contains some strings that are accessed via SIMD instructions, then the

// .cstring section in the object file will be 16-byte-aligned (since SIMD

// requires its operand addresses to be 16-byte aligned). However, there will

// typically also be other cstrings in the same file that aren't used via SIMD

// and don't need this alignment. They will be emitted at some arbitrary address

// `A`, but ld64 will treat them as being 16-byte aligned with an offset of `16

// % A`.

void DeduplicatedCStringSection::finalizeContents() { void DeduplicatedCStringSection::finalizeContents() {

int3AuthorUnsubmitted

Done

I've added this paragraph to further make the case that it's unlikely for any build to depend on this specific behavior.

int3: I've added this paragraph to further make the case that it's unlikely for any build to depend…

// Add all string pieces to the string table builder to create section // Find the largest alignment required for each string.

// contents. for (const CStringInputSection *isec : inputs) {

for (size_t i = 0, e = isec->pieces.size(); i != e; ++i) {

const StringPiece &piece = isec->pieces[i];

if (!piece.live)

continue;

auto s = isec->getCachedHashStringRef(i);

assert(isec->align != 0);

uint8_t trailingZeros = countTrailingZeros(isec->align | piece.inSecOff);

auto it = stringOffsetMap.insert(

std::make_pair(s, StringOffset(trailingZeros)));

if (!it.second && it.first->second.trailingZeros < trailingZeros)

it.first->second.trailingZeros = trailingZeros;

}

// Assign an offset for each string and save it to the corresponding

// StringPieces for easy access.

for (CStringInputSection *isec : inputs) { for (CStringInputSection *isec : inputs) {

for (size_t i = 0, e = isec->pieces.size(); i != e; ++i) for (size_t i = 0, e = isec->pieces.size(); i != e; ++i) {

if (isec->pieces[i].live) if (!isec->pieces[i].live)

isec->pieces[i].outSecOff = continue;

builder.add(isec->getCachedHashStringRef(i)); auto s = isec->getCachedHashStringRef(i);

auto it = stringOffsetMap.find(s);

assert(it != stringOffsetMap.end());

StringOffset &offsetInfo = it->second;

if (offsetInfo.outSecOff == UINT64_MAX) {

offsetInfo.outSecOff = alignTo(size, 1 << offsetInfo.trailingZeros);

size = offsetInfo.outSecOff + s.size();

}

isec->pieces[i].outSecOff = offsetInfo.outSecOff;

}

isec->isFinal = true; isec->isFinal = true;

oontvooUnsubmitted

Done

continue;

auto s = isec->getCachedHashStringRef(i);

auto it = stringOffsetMap.find(s);

assert(it != stringOffsetMap.end());

- uint64_t &outSecOff = it->second.outSecOff;

- if (outSecOff == UINT64_MAX) {

- outSecOff = alignTo(size, 1 << it->second.trailingZeros);

+ StringOffset& offsetInfo = it->second;

+ if (offsetInfo.outSecOff == UINT64_MAX) {

+ offsetInfo.outSecOff = alignTo(size, 1 << offsetInfo.trailingZeros);

size = outSecOff + s.size();

}

- isec->pieces[i].outSecOff = outSecOff;

+ isec->pieces[i].outSecOff = offsetInfo.outSecOff;

}

isec->isFinal = true;

}

how's this?
just to make it a bit clearer you're updating the entry - otherwise it's a bit easy to miss at first glance.

(related nit: could have a static constant (eg., TombStone) so we dont have to remember using UINT64_MAX)

oontvoo: how's this? just to make it a bit clearer you're updating the entry - otherwise it's a bit…

int3AuthorUnsubmitted

Done

excellent suggestion, it's clearer this way :)

re static constant, we already use UINT*_MAX as a sentinel value in a lot of other places in the code (both in LLD/ELF and LLD/MachO), so I think it's a pretty common pattern overall and not likely to cause confusion

int3: excellent suggestion, it's clearer this way :) re static constant, we already use `UINT*_MAX`…

} }

}

builder.finalizeInOrder(); void DeduplicatedCStringSection::writeTo(uint8_t *buf) const {

for (const auto &p : stringOffsetMap) {

StringRef data = p.first.val();

uint64_t off = p.second.outSecOff;

if (!data.empty())

memcpy(buf + off, data.data(), data.size());

}

} }

// This section is actually emitted as __TEXT,__const by ld64, but clang may // This section is actually emitted as __TEXT,__const by ld64, but clang may

// emit input sections of that name, and LLD doesn't currently support mixing // emit input sections of that name, and LLD doesn't currently support mixing

// synthetic and concat-type OutputSections. To work around this, I've given // synthetic and concat-type OutputSections. To work around this, I've given

// our merged-literals section a different name. // our merged-literals section a different name.

WordLiteralSection::WordLiteralSection() WordLiteralSection::WordLiteralSection()

: SyntheticSection(segment_names::text, section_names::literals) { : SyntheticSection(segment_names::text, section_names::literals) {

▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

lld/MachO/ld64-vs-lld.rst

	==================			==================
	LD64 vs LLD-MACHO			LD64 vs LLD-MACHO
	==================			==================

	This doc lists all significant deliberate differences in behavior between LD64 and LLD-MachO.			This doc lists all significant deliberate differences in behavior between LD64 and LLD-MachO.

	String literal deduplication			String literal deduplication
	****************************			****************************
	LD64 always deduplicates string literals. LLD only does it when the `--icf=` or			LD64 always deduplicates string literals. LLD only does it when the `--icf=` or
	the `--deduplicate-literals` flag is passed. Omitting deduplication by default			the `--deduplicate-literals` flag is passed. Omitting deduplication by default
	ensures that our link is as fast as possible. However, it may also break some			ensures that our link is as fast as possible. However, it may also break some
	programs which have (incorrectly) relied on string deduplication always			programs which have (incorrectly) relied on string deduplication always
	occurring. In particular, programs which compare string literals via pointer			occurring. In particular, programs which compare string literals via pointer
	equality must be fixed to use value equality instead.			equality must be fixed to use value equality instead.

				String Alignment
				****************
				LLD is slightly less conservative about aligning cstrings, allowing it to pack
				them more compactly. This should not result in any meaningful semantic
				difference.

	``-no_deduplicate`` Flag			``-no_deduplicate`` Flag
	**********************			**********************
	- LD64:			- LD64:
	* This turns off ICF (deduplication pass) in the linker.			* This turns off ICF (deduplication pass) in the linker.
	- LLD			- LLD
	* This turns off ICF and string merging in the linker.			* This turns off ICF and string merging in the linker.

	ObjC symbols treatment			ObjC symbols treatment
	**********************			**********************
	There are differences in how LLD and LD64 handle ObjC symbols loaded from archives.			There are differences in how LLD and LD64 handle ObjC symbols loaded from archives.

	- LD64:			- LD64:
	* Duplicate ObjC symbols from the same archives will not raise an error. LD64 will pick the first one.			* Duplicate ObjC symbols from the same archives will not raise an error. LD64 will pick the first one.
	* Duplicate ObjC symbols from different archives will raise a "duplicate symbol" error.			* Duplicate ObjC symbols from different archives will raise a "duplicate symbol" error.
	- LLD:			- LLD:
	* Duplicate symbols, regardless of which archives they are from, will raise errors.			* Duplicate symbols, regardless of which archives they are from, will raise errors.

lld/test/MachO/cstring-align.s

This file was added.

				# REQUIRES: x86
				# RUN: rm -rf %t; split-file %s %t
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/align-empty.s -o %t/align-empty.o
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/align-4-0.s -o %t/align-4-0.o
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/align-4-2.s -o %t/align-4-2.o
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/align-16-0.s -o %t/align-16-0.o
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/align-16-2.s -o %t/align-16-2.o
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/align-16-4.s -o %t/align-16-4.o
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/align-16-8.s -o %t/align-16-8.o

				## Check that we preserve the alignment of cstrings. Alignment is determined
				## not by section alignment but by the number of trailing zeros of the cstring's
				## address in the input object file.

				## The non-dedup case is not particularly interesting since the null bytes don't
				## get dedup'ed, meaning that the output strings get their offsets "naturally"
				## preserved.

				# RUN: %lld -dylib %t/align-empty.o %t/align-4-0.o %t/align-16-0.o -o %t/align-4-0-16-0
				# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/align-4-0-16-0 \| \
				# RUN: FileCheck %s -D#OFF1=4 -D#OFF2=16
				# RUN: %lld -dylib %t/align-empty.o %t/align-16-0.o %t/align-4-0.o -o %t/align-16-0-4-0
				# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/align-16-0-4-0 \| \
				# RUN: FileCheck %s -D#OFF1=16 -D#OFF2=20

				# RUN: %lld -dylib %t/align-empty.o %t/align-4-2.o %t/align-16-0.o -o %t/align-4-2-16-0
				# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/align-4-2-16-0 \| \
				# RUN: FileCheck %s -D#OFF1=6 -D#OFF2=16
				# RUN: %lld -dylib %t/align-empty.o %t/align-16-0.o %t/align-4-2.o -o %t/align-16-0-4-2
				# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/align-16-0-4-2 \| \
				# RUN: FileCheck %s -D#OFF1=16 -D#OFF2=22

				# RUN: %lld -dylib %t/align-empty.o %t/align-4-0.o %t/align-16-2.o -o %t/align-4-0-16-2
				# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/align-4-0-16-2 \| \
				# RUN: FileCheck %s -D#OFF1=4 -D#OFF2=18
				# RUN: %lld -dylib %t/align-empty.o %t/align-16-2.o %t/align-4-0.o -o %t/align-16-2-4-0
				# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/align-16-2-4-0 \| \
				# RUN: FileCheck %s -D#OFF1=18 -D#OFF2=20

				# CHECK: Contents of (__TEXT,__cstring) section
				# CHECK-NEXT: [[#%.16x,START:]] {{$}}
				# CHECK: [[#%.16x,START+OFF1]] a{{$}}
				# CHECK: [[#%.16x,START+OFF2]] a{{$}}
				# CHECK-EMPTY:

				## The dedup cases are more interesting...

				## Same offset, different alignments => pick higher alignment
				# RUN: %lld -dylib --deduplicate-literals %t/align-empty.o %t/align-4-0.o %t/align-16-0.o -o %t/dedup-4-0-16-0
				# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/dedup-4-0-16-0 \| \
				# RUN: FileCheck %s --check-prefix=DEDUP -D#OFF=16
				# RUN: %lld -dylib --deduplicate-literals %t/align-empty.o %t/align-16-0.o %t/align-4-0.o -o %t/dedup-16-0-4-0
				# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/dedup-16-0-4-0 \| \
				# RUN: FileCheck %s --check-prefix=DEDUP -D#OFF=16

				## 16 byte alignment vs 2 byte offset => align to 16 bytes
				# RUN: %lld -dylib --deduplicate-literals %t/align-empty.o %t/align-4-2.o %t/align-16-0.o -o %t/dedup-4-2-16-0
				# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/dedup-4-2-16-0 \| \
				# RUN: FileCheck %s --check-prefix=DEDUP -D#OFF=16
				# RUN: %lld -dylib --deduplicate-literals %t/align-empty.o %t/align-16-0.o %t/align-4-2.o -o %t/dedup-16-0-4-2
				# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/dedup-16-0-4-2 \| \
				# RUN: FileCheck %s --check-prefix=DEDUP -D#OFF=16

				## 4 byte alignment vs 2 byte offset => align to 4 bytes
				# RUN: %lld -dylib --deduplicate-literals %t/align-empty.o %t/align-4-0.o %t/align-16-2.o -o %t/dedup-4-0-16-2
				# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/dedup-4-0-16-2 \| \
				# RUN: FileCheck %s --check-prefix=DEDUP -D#OFF=4
				# RUN: %lld -dylib --deduplicate-literals %t/align-empty.o %t/align-16-2.o %t/align-4-0.o -o %t/dedup-16-2-4-0
				# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/dedup-16-2-4-0 \| \
				# RUN: FileCheck %s --check-prefix=DEDUP -D#OFF=4

				## Both inputs are 4-byte aligned, one via offset and the other via section alignment
				# RUN: %lld -dylib --deduplicate-literals %t/align-empty.o %t/align-4-0.o %t/align-16-4.o -o %t/dedup-4-0-16-4
				# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/dedup-4-0-16-4 \| \
				# RUN: FileCheck %s --check-prefix=DEDUP -D#OFF=4
				# RUN: %lld -dylib --deduplicate-literals %t/align-empty.o %t/align-16-4.o %t/align-4-0.o -o %t/dedup-16-4-4-0
				# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/dedup-16-4-4-0 \| \
				# RUN: FileCheck %s --check-prefix=DEDUP -D#OFF=4

				## 8-byte offset vs 4-byte section alignment => align to 8 bytes
				# RUN: %lld -dylib --deduplicate-literals %t/align-empty.o %t/align-4-0.o %t/align-16-8.o -o %t/dedup-4-0-16-8
				# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/dedup-4-0-16-8 \| \
				# RUN: FileCheck %s --check-prefix=DEDUP -D#OFF=8
				# RUN: %lld -dylib --deduplicate-literals %t/align-empty.o %t/align-16-8.o %t/align-4-0.o -o %t/dedup-16-8-4-0
				# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/dedup-16-8-4-0 \| \
				# RUN: FileCheck %s --check-prefix=DEDUP -D#OFF=8

				# DEDUP: Contents of (__TEXT,__cstring) section
				# DEDUP-NEXT: [[#%.16x,START:]] {{$}}
				# DEDUP: [[#%.16x,START+OFF]] a{{$}}
				# DEDUP-EMPTY:

				#--- align-empty.s
				## We use this file to create an empty string at the start of every output
				## file's .cstring section. This makes the test cases more interesting since LLD
				## can't place the string "a" at the trivially-aligned zero offset.
				.cstring
				.p2align 2
				.asciz ""

				#--- align-4-0.s
				.cstring
				.p2align 2
				.asciz "a"

				#--- align-4-2.s
				.cstring
				.p2align 2
				.zero 0x2
				.asciz "a"

				#--- align-16-0.s
				.cstring
				.p2align 4
				.asciz "a"

				#--- align-16-2.s
				.cstring
				.p2align 4
				.zero 0x2
				.asciz "a"

				#--- align-16-4.s
				.cstring
				.p2align 4
				.zero 0x4
				.asciz "a"

				#--- align-16-8.s
				.cstring
				.p2align 4
				.zero 0x8
				.asciz "a"

lld/test/MachO/cstring-dedup.s

	# REQUIRES: x86			# REQUIRES: x86
	# RUN: rm -rf %t; split-file %s %t			# RUN: rm -rf %t; split-file %s %t
	# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/test.s -o %t/test.o			# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/test.s -o %t/test.o
	# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/more-foo.s -o %t/more-foo.o			# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/more-foo.s -o %t/more-foo.o
	# RUN: %lld -dylib --deduplicate-literals %t/test.o %t/more-foo.o -o %t/test			# RUN: %lld -dylib --deduplicate-literals %t/test.o %t/more-foo.o -o %t/test
	# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/test \| \			# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/test \| \
	# RUN: FileCheck %s --check-prefix=STR --implicit-check-not foo --implicit-check-not bar			# RUN: FileCheck %s --check-prefix=STR --implicit-check-not foo --implicit-check-not bar
	# RUN: llvm-objdump --macho --section="__DATA,ptrs" --syms %t/test \| FileCheck %s			# RUN: llvm-objdump --macho --section="__DATA,ptrs" --syms %t/test \| FileCheck %s
	# RUN: llvm-readobj --section-headers %t/test \| FileCheck %s --check-prefix=HEADER			# RUN: llvm-readobj --section-headers %t/test \| FileCheck %s --check-prefix=HEADER

	## Make sure we only have 3 deduplicated strings in __cstring, and that they			## Make sure we only have 3 deduplicated strings in __cstring.
	## are 16-byte-aligned.
	# STR: Contents of (__TEXT,__cstring) section			# STR: Contents of (__TEXT,__cstring) section
	# STR: {{.*}}0 foo			# STR: {{[[:xdigit:]]+}} foo
	# STR: {{.*}}0 barbaz			# STR: {{[[:xdigit:]]+}} barbaz
	# STR: {{.*}}0 {{$}}			# STR: {{[[:xdigit:]]+}} {{$}}

	## Make sure both symbol and section relocations point to the right thing.			## Make sure both symbol and section relocations point to the right thing.
	# CHECK: Contents of (__DATA,ptrs) section			# CHECK: Contents of (__DATA,ptrs) section
	# CHECK-NEXT: __TEXT:__cstring:foo			# CHECK-NEXT: __TEXT:__cstring:foo
	# CHECK-NEXT: __TEXT:__cstring:foo			# CHECK-NEXT: __TEXT:__cstring:foo
	# CHECK-NEXT: __TEXT:__cstring:foo			# CHECK-NEXT: __TEXT:__cstring:foo
	# CHECK-NEXT: __TEXT:__cstring:foo			# CHECK-NEXT: __TEXT:__cstring:foo
	# CHECK-NEXT: __TEXT:__cstring:foo			# CHECK-NEXT: __TEXT:__cstring:foo
	▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Align cstrings less conservativelyClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 414456

lld/MachO/SyntheticSections.h

lld/MachO/SyntheticSections.cpp

lld/MachO/ld64-vs-lld.rst

lld/test/MachO/cstring-align.s

lld/test/MachO/cstring-dedup.s

[lld-macho] Align cstrings less conservatively
ClosedPublic