This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/
-
MachO/
-
InputFiles.cpp
-
InputSection.h
-
InputSection.cpp
-
SyntheticSections.h
1/2
SyntheticSections.cpp
-
Writer.cpp
-
test/MachO/
-
MachO/
-
literal-dedup.s
1/1
mattrs.ll

Differential D103113

[lld-macho] Deduplicate fixed-width literals
ClosedPublic

Authored by int3 on May 25 2021, 1:33 PM.

Download Raw Diff

Details

Reviewers

gkm

Group Reviewers

Restricted Project

Commits

rG5d88f2dd9478: [lld-macho] Deduplicate fixed-width literals

Summary

Conceptually, the implementation is pretty straightforward: we put each
literal value into a hashtable, and then write out the keys of that
hashtable at the end.

In contrast with ELF, the Mach-O format does not support variable-length
literals that aren't strings. Its literals are either 4, 8, or 16 bytes
in length. LLD-ELF dedups its literals via sorting + uniq'ing, but since
we don't need to worry about overly-long values, we should be able to do
a faster job by just hashing.

That said, the implementation right now is far from optimal, because we
add to those hashtables serially. To parallelize this, we'll need a
basic concurrent hashtable (only needs to support concurrent writes w/o
interleave reads), which shouldn't be to hard to implement, but I'd like
to punt on it for now.

Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:

    N           Min           Max        Median           Avg        Stddev
x  20          4.27          4.39         4.315        4.3225   0.033225703
+  20          4.36          4.82          4.44        4.4845    0.13152846
Difference at 95.0% confidence
        0.162 +/- 0.0613971
        3.74783% +/- 1.42041%
        (Student's t, pooled s = 0.0959262)

This corresponds to binary size savings of 2MB out of 335MB, or 0.6%.
It's not a great tradeoff as-is, but as mentioned our implementation can
be signficantly optimized, and literal dedup will unlock more
opportunities for ICF to identify identical structures that reference
the same literals.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

int3 created this revision.May 25 2021, 1:33 PM

Herald added a reviewer: gkm. · View Herald TranscriptMay 25 2021, 1:33 PM

Herald added a project: Restricted Project. · View Herald Transcript

int3 requested review of this revision.May 25 2021, 1:33 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2021, 1:33 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B106150: Diff 347771.May 25 2021, 1:34 PM

remove redundant test change

Harbormaster completed remote builds in B106151: Diff 347772.May 25 2021, 1:34 PM

int3 added inline comments.May 25 2021, 1:35 PM

lld/test/MachO/mattrs.ll
14	the LLVM IR in this test generated literals, which got moved to different addresses in this diff. But this test doesn't actually care about the locations of the literals, so I've changed it accordingly.

int3 planned changes to this revision.May 25 2021, 1:38 PM

This does not compile:

std::vector<String<4>> strings4;
strings4.reserve(number_of_4_strings*sizeof(String<4>));
for (size_t i = 0, e = isec->data.size() / 4; i < e; ++i) {
  strings4.emplace_back(String<4>(xxx));
}
strings4.sort();
auto bla = std::unique(strings4.begin(), strings4.end());

The hash map will have a lot of mallocs. I would expect sort + unique to be faster. Maybe even parallel sort. String<4> could be like a StringRef, but with fixed size (4).

The for-loop could be llvm::parallel_for_each_n(...)

In D103113#2782094, @tschuett wrote:
This does not compile:
std::vector<String<4>> strings4;
strings4.reserve(number_of_4_strings*sizeof(String<4>));
for (size_t i = 0, e = isec->data.size() / 4; i < e; ++i) {
  strings4.emplace_back(String<4>(xxx));
}
strings4.sort();
auto bla = std::unique(strings4.begin(), strings4.end());
The hash map will have a lot of mallocs. I would expect sort + unique to be faster. Maybe even parallel sort.

The emplace loop could even be a memcpy?!? Or do everything in-place?

The hash map will have a lot of mallocs. I would expect sort + unique to be faster.

I guess I could reserve() memory for the hashmap before inserting...

But as the commit message notes, I'm aware this is far from an optimal implementation, and I think we can revisit this later after building out our other features. I'm mostly interested in this to see how much of a win it'll unlock when coupled with ICF.

In D103113#2783198, @int3 wrote:

The hash map will have a lot of mallocs. I would expect sort + unique to be faster.

I guess I could reserve() memory for the hashmap before inserting...

But as the commit message notes, I'm aware this is far from an optimal implementation, and I think we can revisit this later after building out our other features. I'm mostly interested in this to see how much of a win it'll unlock when coupled with ICF.

No worries.

update

Harbormaster completed remote builds in B107337: Diff 349398.Jun 2 2021, 3:22 PM

This was very pleasant to read & review!

lld/MachO/SyntheticSections.cpp
1160–1166	What is the purpose of the braces around these `case` bodies?

This revision is now accepted and ready to land.Jun 8 2021, 5:01 PM

int3 added inline comments.Jun 8 2021, 10:05 PM

lld/MachO/SyntheticSections.cpp
1160–1166	Declaring variables whose scope can leak out of the switch-case raises the error "cannot jump from switch statement to this case label... jump bypasses variable initialization". In this case it's not actually needed since `i` and `value` are scoped within the `for` loop, but nonetheless I like putting braces around non-trivial case blocks so I don't have to worry about this issue.

Thanks!

This revision was landed with ongoing or failed builds.Jun 11 2021, 4:50 PM

Closed by commit rG5d88f2dd9478: [lld-macho] Deduplicate fixed-width literals (authored by int3). · Explain Why

This revision was automatically updated to reflect the committed changes.

int3 added a commit: rG5d88f2dd9478: [lld-macho] Deduplicate fixed-width literals.

Revision Contents

Path

Size

lld/

MachO/

17 lines

19 lines

19 lines

55 lines

SyntheticSections.cpp

58 lines

Writer.cpp

23 lines

test/

MachO/

literal-dedup.s

110 lines

mattrs.ll

4 lines

Diff 351585

lld/MachO/InputFiles.cpp

	Show First 20 Lines • Show All 259 Lines • ▼ Show 20 Lines
	}			}

	template <class Section>			template <class Section>
	void ObjFile::parseSections(ArrayRef<Section> sections) {			void ObjFile::parseSections(ArrayRef<Section> sections) {
	subsections.reserve(sections.size());			subsections.reserve(sections.size());
	auto buf = reinterpret_cast<const uint8_t >(mb.getBufferStart());			auto buf = reinterpret_cast<const uint8_t >(mb.getBufferStart());

	for (const Section &sec : sections) {			for (const Section &sec : sections) {
	if (config->dedupLiterals && sectionType(sec.flags) == S_CSTRING_LITERALS) {			if (config->dedupLiterals &&
				(sectionType(sec.flags) == S_CSTRING_LITERALS \|\|
				isWordLiteralSection(sec.flags))) {
	if (sec.nreloc)			if (sec.nreloc)
	fatal(toString(this) + " contains relocations in " + sec.segname + "," +			fatal(toString(this) + " contains relocations in " + sec.segname + "," +
	sec.sectname +			sec.sectname +
	", so LLD cannot deduplicate literals. Try re-running without "			", so LLD cannot deduplicate literals. Try re-running without "
	"--deduplicate-literals.");			"--deduplicate-literals.");

	auto *isec = make<CStringInputSection>();			InputSection *isec;
				if (sectionType(sec.flags) == S_CSTRING_LITERALS) {
				isec = make<CStringInputSection>();
	parseSection(this, buf, sec, isec);			parseSection(this, buf, sec, isec);
	isec->splitIntoPieces(); // FIXME: parallelize this?			// FIXME: parallelize this?
				cast<CStringInputSection>(isec)->splitIntoPieces();
				} else {
				isec = make<WordLiteralInputSection>();
				parseSection(this, buf, sec, isec);
				}
	subsections.push_back({{0, isec}});			subsections.push_back({{0, isec}});
	} else {			} else {
	auto *isec = make<ConcatInputSection>();			auto *isec = make<ConcatInputSection>();
	parseSection(this, buf, sec, isec);			parseSection(this, buf, sec, isec);
	if (!(isDebugSection(isec->flags) &&			if (!(isDebugSection(isec->flags) &&
	isec->segname == segment_names::dwarf)) {			isec->segname == segment_names::dwarf)) {
	subsections.push_back({{0, isec}});			subsections.push_back({{0, isec}});
	} else {			} else {
	▲ Show 20 Lines • Show All 909 Lines • Show Last 20 Lines

lld/MachO/InputSection.h

Show All 22 Lines
class InputFile;		class InputFile;
class OutputSection;		class OutputSection;

class InputSection {		class InputSection {
public:		public:
enum Kind {		enum Kind {
ConcatKind,		ConcatKind,
CStringLiteralKind,		CStringLiteralKind,
		WordLiteralKind,
};		};

Kind kind() const { return sectionKind; }		Kind kind() const { return sectionKind; }
virtual ~InputSection() = default;		virtual ~InputSection() = default;
virtual uint64_t getSize() const { return data.size(); }		virtual uint64_t getSize() const { return data.size(); }
uint64_t getFileSize() const;		uint64_t getFileSize() const;
// Translates \p off -- an offset relative to this InputSection -- into an		// Translates \p off -- an offset relative to this InputSection -- into an
// offset from the beginning of its parent OutputSection.		// offset from the beginning of its parent OutputSection.
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	public:

static bool classof(const InputSection *isec) {		static bool classof(const InputSection *isec) {
return isec->kind() == CStringLiteralKind;		return isec->kind() == CStringLiteralKind;
}		}

std::vector<StringPiece> pieces;		std::vector<StringPiece> pieces;
};		};

		class WordLiteralInputSection : public InputSection {
		public:
		WordLiteralInputSection() : InputSection(WordLiteralKind) {}
		uint64_t getFileOffset(uint64_t off) const override;
		uint64_t getOffset(uint64_t off) const override;

		static bool classof(const InputSection *isec) {
		return isec->kind() == WordLiteralKind;
		}
		};

inline uint8_t sectionType(uint32_t flags) {		inline uint8_t sectionType(uint32_t flags) {
return flags & llvm::MachO::SECTION_TYPE;		return flags & llvm::MachO::SECTION_TYPE;
}		}

inline bool isZeroFill(uint32_t flags) {		inline bool isZeroFill(uint32_t flags) {
return llvm::MachO::isVirtualSection(sectionType(flags));		return llvm::MachO::isVirtualSection(sectionType(flags));
}		}

inline bool isThreadLocalVariables(uint32_t flags) {		inline bool isThreadLocalVariables(uint32_t flags) {
return sectionType(flags) == llvm::MachO::S_THREAD_LOCAL_VARIABLES;		return sectionType(flags) == llvm::MachO::S_THREAD_LOCAL_VARIABLES;
}		}

// These sections contain the data for initializing thread-local variables.		// These sections contain the data for initializing thread-local variables.
inline bool isThreadLocalData(uint32_t flags) {		inline bool isThreadLocalData(uint32_t flags) {
return sectionType(flags) == llvm::MachO::S_THREAD_LOCAL_REGULAR \|\|		return sectionType(flags) == llvm::MachO::S_THREAD_LOCAL_REGULAR \|\|
sectionType(flags) == llvm::MachO::S_THREAD_LOCAL_ZEROFILL;		sectionType(flags) == llvm::MachO::S_THREAD_LOCAL_ZEROFILL;
}		}

inline bool isDebugSection(uint32_t flags) {		inline bool isDebugSection(uint32_t flags) {
return (flags & llvm::MachO::SECTION_ATTRIBUTES_USR) ==		return (flags & llvm::MachO::SECTION_ATTRIBUTES_USR) ==
llvm::MachO::S_ATTR_DEBUG;		llvm::MachO::S_ATTR_DEBUG;
}		}

		inline bool isWordLiteralSection(uint32_t flags) {
		return sectionType(flags) == llvm::MachO::S_4BYTE_LITERALS \|\|
		sectionType(flags) == llvm::MachO::S_8BYTE_LITERALS \|\|
		sectionType(flags) == llvm::MachO::S_16BYTE_LITERALS;
		}

bool isCodeSection(const InputSection *);		bool isCodeSection(const InputSection *);

extern std::vector<InputSection *> inputSections;		extern std::vector<InputSection *> inputSections;

namespace section_names {		namespace section_names {

constexpr const char authGot[] = "__auth_got";		constexpr const char authGot[] = "__auth_got";
constexpr const char authPtr[] = "__auth_ptr";		constexpr const char authPtr[] = "__auth_ptr";
Show All 12 Lines
constexpr const char export_[] = "__export";		constexpr const char export_[] = "__export";
constexpr const char functionStarts[] = "__func_starts";		constexpr const char functionStarts[] = "__func_starts";
constexpr const char got[] = "__got";		constexpr const char got[] = "__got";
constexpr const char header[] = "__mach_header";		constexpr const char header[] = "__mach_header";
constexpr const char indirectSymbolTable[] = "__ind_sym_tab";		constexpr const char indirectSymbolTable[] = "__ind_sym_tab";
constexpr const char const_[] = "__const";		constexpr const char const_[] = "__const";
constexpr const char lazySymbolPtr[] = "__la_symbol_ptr";		constexpr const char lazySymbolPtr[] = "__la_symbol_ptr";
constexpr const char lazyBinding[] = "__lazy_binding";		constexpr const char lazyBinding[] = "__lazy_binding";
		constexpr const char literals[] = "__literals";
constexpr const char moduleInitFunc[] = "__mod_init_func";		constexpr const char moduleInitFunc[] = "__mod_init_func";
constexpr const char moduleTermFunc[] = "__mod_term_func";		constexpr const char moduleTermFunc[] = "__mod_term_func";
constexpr const char nonLazySymbolPtr[] = "__nl_symbol_ptr";		constexpr const char nonLazySymbolPtr[] = "__nl_symbol_ptr";
constexpr const char objcCatList[] = "__objc_catlist";		constexpr const char objcCatList[] = "__objc_catlist";
constexpr const char objcClassList[] = "__objc_classlist";		constexpr const char objcClassList[] = "__objc_classlist";
constexpr const char objcConst[] = "__objc_const";		constexpr const char objcConst[] = "__objc_const";
constexpr const char objcImageInfo[] = "__objc_imageinfo";		constexpr const char objcImageInfo[] = "__objc_imageinfo";
constexpr const char objcNonLazyCatList[] = "__objc_nlcatlist";		constexpr const char objcNonLazyCatList[] = "__objc_nlcatlist";
Show All 28 Lines

lld/MachO/InputSection.cpp

	Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines
	}			}

	uint64_t CStringInputSection::getOffset(uint64_t off) const {			uint64_t CStringInputSection::getOffset(uint64_t off) const {
	const StringPiece &piece = getStringPiece(off);			const StringPiece &piece = getStringPiece(off);
	uint64_t addend = off - piece.inSecOff;			uint64_t addend = off - piece.inSecOff;
	return piece.outSecOff + addend;			return piece.outSecOff + addend;
	}			}

				uint64_t WordLiteralInputSection::getFileOffset(uint64_t off) const {
				return parent->fileOff + getOffset(off);
				}

				uint64_t WordLiteralInputSection::getOffset(uint64_t off) const {
				auto *osec = cast<WordLiteralSection>(parent);
				const uint8_t *buf = data.data();
				switch (sectionType(flags)) {
				case S_4BYTE_LITERALS:
				return osec->getLiteral4Offset(buf + off);
				case S_8BYTE_LITERALS:
				return osec->getLiteral8Offset(buf + off);
				case S_16BYTE_LITERALS:
				return osec->getLiteral16Offset(buf + off);
				default:
				llvm_unreachable("invalid literal section type");
				}
				}

	bool macho::isCodeSection(const InputSection *isec) {			bool macho::isCodeSection(const InputSection *isec) {
	uint32_t type = sectionType(isec->flags);			uint32_t type = sectionType(isec->flags);
	if (type != S_REGULAR && type != S_COALESCED)			if (type != S_REGULAR && type != S_COALESCED)
	return false;			return false;

	uint32_t attr = isec->flags & SECTION_ATTRIBUTES_USR;			uint32_t attr = isec->flags & SECTION_ATTRIBUTES_USR;
	if (attr == S_ATTR_PURE_INSTRUCTIONS)			if (attr == S_ATTR_PURE_INSTRUCTIONS)
	return true;			return true;
	Show All 12 Lines

lld/MachO/SyntheticSections.h

Show All 16 Lines
#include "Target.h"		#include "Target.h"

#include "llvm/ADT/Hashing.h"		#include "llvm/ADT/Hashing.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/MC/StringTableBuilder.h"		#include "llvm/MC/StringTableBuilder.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

		#include <unordered_map>

namespace llvm {		namespace llvm {
class DWARFUnit;		class DWARFUnit;
} // namespace llvm		} // namespace llvm

namespace lld {		namespace lld {
namespace macho {		namespace macho {

class Defined;		class Defined;
▲ Show 20 Lines • Show All 259 Lines • ▼ Show 20 Lines	public:
bool isNeeded() const override { return !entries.empty(); }		bool isNeeded() const override { return !entries.empty(); }
void finalize() override;		void finalize() override;
void writeTo(uint8_t *buf) const override;		void writeTo(uint8_t *buf) const override;
const llvm::SetVector<Symbol *> &getEntries() const { return entries; }		const llvm::SetVector<Symbol *> &getEntries() const { return entries; }
// Returns whether the symbol was added. Note that every stubs entry will		// Returns whether the symbol was added. Note that every stubs entry will
// have a corresponding entry in the LazyPointerSection.		// have a corresponding entry in the LazyPointerSection.
bool addEntry(Symbol *);		bool addEntry(Symbol *);
uint64_t getVA(uint32_t stubsIndex) const {		uint64_t getVA(uint32_t stubsIndex) const {
		assert(isFinal \|\| target->usesThunks());
// ConcatOutputSection::finalize() can seek the address of a		// ConcatOutputSection::finalize() can seek the address of a
// stub before its address is assigned. Before __stubs is		// stub before its address is assigned. Before __stubs is
// finalized, return a contrived out-of-range address.		// finalized, return a contrived out-of-range address.
return isFinal ? addr + stubsIndex * target->stubSize		return isFinal ? addr + stubsIndex * target->stubSize
: TargetInfo::outOfRangeVA;		: TargetInfo::outOfRangeVA;
}		}

bool isFinal = false; // is address assigned?		bool isFinal = false; // is address assigned?
▲ Show 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	public:
void writeTo(uint8_t *buf) const override { builder.write(buf); }		void writeTo(uint8_t *buf) const override { builder.write(buf); }

std::vector<CStringInputSection *> inputs;		std::vector<CStringInputSection *> inputs;

private:		private:
llvm::StringTableBuilder builder;		llvm::StringTableBuilder builder;
};		};

		/*
		* This section contains deduplicated literal values. The 16-byte values are
		* laid out first, followed by the 8- and then the 4-byte ones.
		*/
		class WordLiteralSection : public SyntheticSection {
		public:
		using UInt128 = std::pair<uint64_t, uint64_t>;
		// I don't think the standard guarantees the size of a pair, so let's make
		// sure it's exact -- that way we can construct it via `mmap`.
		static_assert(sizeof(UInt128) == 16, "");

		WordLiteralSection();
		void addInput(WordLiteralInputSection *);
		void writeTo(uint8_t *buf) const override;

		uint64_t getSize() const override {
		return literal16Map.size() * 16 + literal8Map.size() * 8 +
		literal4Map.size() * 4;
		}

		bool isNeeded() const override {
		return !literal16Map.empty() \|\| !literal4Map.empty() \|\|
		!literal8Map.empty();
		}

		uint64_t getLiteral16Offset(const uint8_t *buf) const {
		return literal16Map.at(reinterpret_cast<const UInt128 >(buf)) * 16;
		}

		uint64_t getLiteral8Offset(const uint8_t *buf) const {
		return literal16Map.size() * 16 +
		literal8Map.at(reinterpret_cast<const uint64_t >(buf)) * 8;
		}

		uint64_t getLiteral4Offset(const uint8_t *buf) const {
		return literal16Map.size() * 16 + literal8Map.size() * 8 +
		literal4Map.at(reinterpret_cast<const uint32_t >(buf)) * 4;
		}

		private:
		template <class T> struct Hasher {
		llvm::hash_code operator()(T v) const { return llvm::hash_value(v); }
		};
		// We're using unordered_map instead of DenseMap here because we need to
		// support all possible integer values -- there are no suitable tombstone
		// values for DenseMap.
		std::unordered_map<UInt128, uint64_t, Hasher<UInt128>> literal16Map;
		std::unordered_map<uint64_t, uint64_t> literal8Map;
		std::unordered_map<uint32_t, uint64_t> literal4Map;
		};

struct InStruct {		struct InStruct {
MachHeaderSection *header = nullptr;		MachHeaderSection *header = nullptr;
CStringSection *cStringSection = nullptr;		CStringSection *cStringSection = nullptr;
		WordLiteralSection *wordLiteralSection = nullptr;
RebaseSection *rebase = nullptr;		RebaseSection *rebase = nullptr;
BindingSection *binding = nullptr;		BindingSection *binding = nullptr;
WeakBindingSection *weakBinding = nullptr;		WeakBindingSection *weakBinding = nullptr;
LazyBindingSection *lazyBinding = nullptr;		LazyBindingSection *lazyBinding = nullptr;
ExportSection *exports = nullptr;		ExportSection *exports = nullptr;
GotSection *got = nullptr;		GotSection *got = nullptr;
TlvPointerSection *tlvPointers = nullptr;		TlvPointerSection *tlvPointers = nullptr;
LazyPointerSection *lazyPointers = nullptr;		LazyPointerSection *lazyPointers = nullptr;
Show All 15 Lines

lld/MachO/SyntheticSections.cpp

Show First 20 Lines • Show All 1,135 Lines • ▼ Show 20 Lines for (CStringInputSection *isec : inputs) {

for (size_t i = 0, e = isec->pieces.size(); i != e; ++i) { for (size_t i = 0, e = isec->pieces.size(); i != e; ++i) {

isec->pieces[i].outSecOff = isec->pieces[i].outSecOff =

builder.getOffset(isec->getCachedHashStringRef(i)); builder.getOffset(isec->getCachedHashStringRef(i));

isec->isFinal = true; isec->isFinal = true;

} }

// This section is actually emitted as __TEXT,__const by ld64, but clang may

// emit input sections of that name, and LLD doesn't currently support mixing

// synthetic and concat-type OutputSections. To work around this, I've given

// our merged-literals section a different name.

WordLiteralSection::WordLiteralSection()

: SyntheticSection(segment_names::text, section_names::literals) {

align = 16;

}

void WordLiteralSection::addInput(WordLiteralInputSection *isec) {

isec->parent = this;

// We do all processing of the InputSection here, so it will be effectively

// finalized.

isec->isFinal = true;

const uint8_t *buf = isec->data.data();

switch (sectionType(isec->flags)) {

case S_4BYTE_LITERALS: {

for (size_t i = 0, e = isec->data.size() / 4; i < e; ++i) {

uint32_t value = *reinterpret_cast<const uint32_t *>(buf + i * 4);

literal4Map.emplace(value, literal4Map.size());

}

break;

}

gkmUnsubmitted

Not Done

switch (sectionType(isec->flags)) {

- case S_4BYTE_LITERALS: {

+ case S_4BYTE_LITERALS:

for (size_t i = 0, e = isec->data.size() / 4; i < e; ++i) {

uint32_t value = *reinterpret_cast<const uint32_t *>(buf + i * 4);

literal4Map.emplace(value, literal4Map.size());

}

break;

- }

case S_8BYTE_LITERALS: {

What is the purpose of the braces around these case bodies?

gkm: What is the purpose of the braces around these `case` bodies?

int3AuthorUnsubmitted

Done

Declaring variables whose scope can leak out of the switch-case raises the error "cannot jump from switch statement to this case label... jump bypasses variable initialization". In this case it's not actually needed since i and value are scoped within the for loop, but nonetheless I like putting braces around non-trivial case blocks so I don't have to worry about this issue.

int3: Declaring variables whose scope can leak out of the switch-case raises the error "cannot jump…

case S_8BYTE_LITERALS: {

for (size_t i = 0, e = isec->data.size() / 8; i < e; ++i) {

uint64_t value = *reinterpret_cast<const uint64_t *>(buf + i * 8);

literal8Map.emplace(value, literal8Map.size());

}

break;

}

case S_16BYTE_LITERALS: {

for (size_t i = 0, e = isec->data.size() / 16; i < e; ++i) {

UInt128 value = *reinterpret_cast<const UInt128 *>(buf + i * 16);

literal16Map.emplace(value, literal16Map.size());

}

break;

}

default:

llvm_unreachable("invalid literal section type");

}

void WordLiteralSection::writeTo(uint8_t *buf) const {

// Note that we don't attempt to do any endianness conversion in addInput(),

// so we don't do it here either -- just write out the original value,

// byte-for-byte.

for (const auto &p : literal16Map)

memcpy(buf + p.second * 16, &p.first, 16);

buf += literal16Map.size() * 16;

for (const auto &p : literal8Map)

memcpy(buf + p.second * 8, &p.first, 8);

buf += literal8Map.size() * 8;

for (const auto &p : literal4Map)

memcpy(buf + p.second * 4, &p.first, 4);

}

void macho::createSyntheticSymbols() { void macho::createSyntheticSymbols() {

auto addHeaderSymbol = [](const char *name) { auto addHeaderSymbol = [](const char *name) {

symtab->addSynthetic(name, in.header->isec, /*value=*/0, symtab->addSynthetic(name, in.header->isec, /*value=*/0,

/*privateExtern=*/true, /*includeInSymtab=*/false, /*privateExtern=*/true, /*includeInSymtab=*/false,

/*referencedDynamically=*/false); /*referencedDynamically=*/false);

}; };

switch (config->outputType) { switch (config->outputType) {

▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

lld/MachO/Writer.cpp

Show First 20 Lines • Show All 857 Lines • ▼ Show 20 Lines	template <class LP> void Writer::createOutputSections() {
}		}

// Then add input sections to output sections.		// Then add input sections to output sections.
DenseMap<NamePair, ConcatOutputSection *> concatOutputSections;		DenseMap<NamePair, ConcatOutputSection *> concatOutputSections;
for (const auto &p : enumerate(inputSections)) {		for (const auto &p : enumerate(inputSections)) {
InputSection *isec = p.value();		InputSection *isec = p.value();
if (isec->shouldOmitFromOutput())		if (isec->shouldOmitFromOutput())
continue;		continue;
		OutputSection *osec;
if (auto *concatIsec = dyn_cast<ConcatInputSection>(isec)) {		if (auto *concatIsec = dyn_cast<ConcatInputSection>(isec)) {
NamePair names = maybeRenameSection({isec->segname, isec->name});		NamePair names = maybeRenameSection({isec->segname, isec->name});
ConcatOutputSection *&osec = concatOutputSections[names];		ConcatOutputSection *&concatOsec = concatOutputSections[names];
if (osec == nullptr) {		if (concatOsec == nullptr)
osec = make<ConcatOutputSection>(names.second);		concatOsec = make<ConcatOutputSection>(names.second);
osec->inputOrder = p.index();		concatOsec->addInput(concatIsec);
}		osec = concatOsec;
osec->addInput(concatIsec);
} else if (auto *cStringIsec = dyn_cast<CStringInputSection>(isec)) {		} else if (auto *cStringIsec = dyn_cast<CStringInputSection>(isec)) {
if (in.cStringSection->inputs.empty())
in.cStringSection->inputOrder = p.index();
in.cStringSection->addInput(cStringIsec);		in.cStringSection->addInput(cStringIsec);
		osec = in.cStringSection;
		} else if (auto *litIsec = dyn_cast<WordLiteralInputSection>(isec)) {
		in.wordLiteralSection->addInput(litIsec);
		osec = in.wordLiteralSection;
		} else {
		llvm_unreachable("unhandled InputSection type");
}		}
		osec->inputOrder = std::min(osec->inputOrder, static_cast<int>(p.index()));
}		}

// Once all the inputs are added, we can finalize the output section		// Once all the inputs are added, we can finalize the output section
// properties and create the corresponding output segments.		// properties and create the corresponding output segments.
for (const auto &it : concatOutputSections) {		for (const auto &it : concatOutputSections) {
StringRef segname = it.first.first;		StringRef segname = it.first.first;
ConcatOutputSection *osec = it.second;		ConcatOutputSection *osec = it.second;
if (segname == segment_names::ld) {		if (segname == segment_names::ld) {
▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	template <class LP> void Writer::run() {
writeOutputFile();		writeOutputFile();
}		}

template <class LP> void macho::writeResult() { Writer().run<LP>(); }		template <class LP> void macho::writeResult() { Writer().run<LP>(); }

void macho::createSyntheticSections() {		void macho::createSyntheticSections() {
in.header = make<MachHeaderSection>();		in.header = make<MachHeaderSection>();
in.cStringSection = config->dedupLiterals ? make<CStringSection>() : nullptr;		in.cStringSection = config->dedupLiterals ? make<CStringSection>() : nullptr;
		in.wordLiteralSection =
		config->dedupLiterals ? make<WordLiteralSection>() : nullptr;
in.rebase = make<RebaseSection>();		in.rebase = make<RebaseSection>();
in.binding = make<BindingSection>();		in.binding = make<BindingSection>();
in.weakBinding = make<WeakBindingSection>();		in.weakBinding = make<WeakBindingSection>();
in.lazyBinding = make<LazyBindingSection>();		in.lazyBinding = make<LazyBindingSection>();
in.exports = make<ExportSection>();		in.exports = make<ExportSection>();
in.got = make<GotSection>();		in.got = make<GotSection>();
in.tlvPointers = make<TlvPointerSection>();		in.tlvPointers = make<TlvPointerSection>();
in.lazyPointers = make<LazyPointerSection>();		in.lazyPointers = make<LazyPointerSection>();
Show All 10 Lines

lld/test/MachO/literal-dedup.s

This file was added.

				# REQUIRES: x86
				# RUN: rm -rf %t; split-file %s %t
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/test.s -o %t/test.o
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/qux.s -o %t/qux.o
				# RUN: %lld -dylib --deduplicate-literals %t/test.o %t/qux.o -o %t/test
				# RUN: llvm-objdump --macho --section="__TEXT,__literals" --section="__DATA,ptrs" --syms %t/test \| FileCheck %s
				# RUN: llvm-readobj --section-headers %t/test \| FileCheck %s --check-prefix=HEADER

				# CHECK: Contents of (__TEXT,__literals) section
				# CHECK-NEXT: [[#%.16x,DEADBEEF16:]] ef be ad de ef be ad de ef be ad de ef be ad de
				# CHECK-NEXT: [[#%.16x,FEEDFACE16:]] ce fa ed fe ce fa ed fe ce fa ed fe ce fa ed fe
				# CHECK-NEXT: [[#%.16x,DEADBEEF8:]] ef be ad de ef be ad de ce fa ed fe ce fa ed fe
				# CHECK-NEXT: [[#%.16x,DEADBEEF4:]] ef be ad de ce fa ed fe
				# CHECK-NEXT: Contents of (__DATA,ptrs) section
				# CHECK-NEXT: 0000000000001000 0x[[#%x,DEADBEEF16]]
				# CHECK-NEXT: 0000000000001008 0x[[#%x,DEADBEEF16]]
				# CHECK-NEXT: 0000000000001010 0x[[#%x,FEEDFACE16]]
				# CHECK-NEXT: 0000000000001018 0x[[#%x,DEADBEEF16]]
				# CHECK-NEXT: 0000000000001020 0x[[#%x,DEADBEEF8]]
				# CHECK-NEXT: 0000000000001028 0x[[#%x,DEADBEEF8]]
				# CHECK-NEXT: 0000000000001030 0x[[#%x,DEADBEEF8 + 8]]
				# CHECK-NEXT: 0000000000001038 0x[[#%x,DEADBEEF8]]
				# CHECK-NEXT: 0000000000001040 0x[[#%x,DEADBEEF4]]
				# CHECK-NEXT: 0000000000001048 0x[[#%x,DEADBEEF4]]
				# CHECK-NEXT: 0000000000001050 0x[[#%x,DEADBEEF4 + 4]]
				# CHECK-NEXT: 0000000000001058 0x[[#%x,DEADBEEF4]]

				## Make sure the symbol addresses are correct too.
				# CHECK: SYMBOL TABLE:
				# CHECK-DAG: [[#DEADBEEF16]] g O __TEXT,__literals _qux16
				# CHECK-DAG: [[#DEADBEEF8]] g O __TEXT,__literals _qux8
				# CHECK-DAG: [[#DEADBEEF4]] g O __TEXT,__literals _qux4

				## Make sure we set the right alignment and flags.
				# HEADER: Name: __literals
				# HEADER-NEXT: Segment: __TEXT
				# HEADER-NEXT: Address:
				# HEADER-NEXT: Size:
				# HEADER-NEXT: Offset:
				# HEADER-NEXT: Alignment: 4
				# HEADER-NEXT: RelocationOffset:
				# HEADER-NEXT: RelocationCount: 0
				# HEADER-NEXT: Type: Regular
				# HEADER-NEXT: Attributes [ (0x0)
				# HEADER-NEXT: ]
				# HEADER-NEXT: Reserved1: 0x0
				# HEADER-NEXT: Reserved2: 0x0
				# HEADER-NEXT: Reserved3: 0x0

				#--- test.s
				.literal4
				.p2align 2
				L._foo4:
				.long 0xdeadbeef
				L._bar4:
				.long 0xdeadbeef
				L._baz4:
				.long 0xfeedface

				.literal8
				L._foo8:
				.quad 0xdeadbeefdeadbeef
				L._bar8:
				.quad 0xdeadbeefdeadbeef
				L._baz8:
				.quad 0xfeedfacefeedface

				.literal16
				L._foo16:
				.quad 0xdeadbeefdeadbeef
				.quad 0xdeadbeefdeadbeef
				L._bar16:
				.quad 0xdeadbeefdeadbeef
				.quad 0xdeadbeefdeadbeef
				L._baz16:
				.quad 0xfeedfacefeedface
				.quad 0xfeedfacefeedface

				.section __DATA,ptrs,literal_pointers
				.quad L._foo16
				.quad L._bar16
				.quad L._baz16
				.quad _qux16

				.quad L._foo8
				.quad L._bar8
				.quad L._baz8
				.quad _qux8

				.quad L._foo4
				.quad L._bar4
				.quad L._baz4
				.quad _qux4

				#--- qux.s
				.globl _qux4, _qux8, _qux16

				.literal4
				.p2align 2
				_qux4:
				.long 0xdeadbeef

				.literal8
				_qux8:
				.quad 0xdeadbeefdeadbeef

				.literal16
				_qux16:
				.quad 0xdeadbeefdeadbeef
				.quad 0xdeadbeefdeadbeef

lld/test/MachO/mattrs.ll

	; REQUIRES: x86			; REQUIRES: x86
	; RUN: llvm-as %s -o %t.o			; RUN: llvm-as %s -o %t.o

	;; Verify that LTO behavior can be tweaked using -mattr.			;; Verify that LTO behavior can be tweaked using -mattr.

	; RUN: %lld -mcpu haswell -mllvm -mattr=+fma %t.o -o %t.dylib -dylib			; RUN: %lld -mcpu haswell -mllvm -mattr=+fma %t.o -o %t.dylib -dylib
	; RUN: llvm-objdump -d --section="__text" --no-leading-addr --no-show-raw-insn %t.dylib \| FileCheck %s --check-prefix=FMA			; RUN: llvm-objdump -d --section="__text" --no-leading-addr --no-show-raw-insn %t.dylib \| FileCheck %s --check-prefix=FMA

	; RUN: %lld -mcpu haswell -mllvm -mattr=-fma %t.o -o %t.dylib -dylib			; RUN: %lld -mcpu haswell -mllvm -mattr=-fma %t.o -o %t.dylib -dylib
	; RUN: llvm-objdump -d --section="__text" --no-leading-addr --no-show-raw-insn %t.dylib \| FileCheck %s --check-prefix=NO-FMA			; RUN: llvm-objdump -d --section="__text" --no-leading-addr --no-show-raw-insn %t.dylib \| FileCheck %s --check-prefix=NO-FMA

	; FMA: <_foo>:			; FMA: <_foo>:
	; FMA-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; FMA-NEXT: vrcpss %xmm0, %xmm0, %xmm1
	; FMA-NEXT: vfmsub213ss 7(%rip), %xmm1, %xmm0			; FMA-NEXT: vfmsub213ss [[#]](%rip), %xmm1, %xmm0
				int3AuthorUnsubmitted Done Reply Inline Actions the LLVM IR in this test generated literals, which got moved to different addresses in this diff. But this test doesn't actually care about the locations of the literals, so I've changed it accordingly. int3: the LLVM IR in this test generated literals, which got moved to different addresses in this…
	; FMA-NEXT: vfnmadd132ss %xmm1, %xmm1, %xmm0			; FMA-NEXT: vfnmadd132ss %xmm1, %xmm1, %xmm0
	; FMA-NEXT: retq			; FMA-NEXT: retq

	; NO-FMA: <_foo>:			; NO-FMA: <_foo>:
	; NO-FMA-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; NO-FMA-NEXT: vrcpss %xmm0, %xmm0, %xmm1
	; NO-FMA-NEXT: vmulss %xmm1, %xmm0, %xmm0			; NO-FMA-NEXT: vmulss %xmm1, %xmm0, %xmm0
	; NO-FMA-NEXT: vmovss 16(%rip), %xmm2			; NO-FMA-NEXT: vmovss [[#]](%rip), %xmm2
	; NO-FMA-NEXT: vsubss %xmm0, %xmm2, %xmm0			; NO-FMA-NEXT: vsubss %xmm0, %xmm2, %xmm0
	; NO-FMA-NEXT: vmulss %xmm0, %xmm1, %xmm0			; NO-FMA-NEXT: vmulss %xmm0, %xmm1, %xmm0
	; NO-FMA-NEXT: vaddss %xmm0, %xmm1, %xmm0			; NO-FMA-NEXT: vaddss %xmm0, %xmm1, %xmm0
	; NO-FMA-NEXT: retq			; NO-FMA-NEXT: retq

	target triple = "x86_64-apple-darwin"			target triple = "x86_64-apple-darwin"
	target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

	define float @foo(float %x) #0 {			define float @foo(float %x) #0 {
	%div = fdiv fast float 1.0, %x			%div = fdiv fast float 1.0, %x
	ret float %div			ret float %div
	}			}

	attributes #0 = { "unsafe-fp-math"="true" "reciprocal-estimates"="divf,vec-divf" }			attributes #0 = { "unsafe-fp-math"="true" "reciprocal-estimates"="divf,vec-divf" }

This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Deduplicate fixed-width literalsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 351585

lld/MachO/InputFiles.cpp

lld/MachO/InputSection.h

lld/MachO/InputSection.cpp

lld/MachO/SyntheticSections.h

lld/MachO/SyntheticSections.cpp

lld/MachO/Writer.cpp

lld/test/MachO/literal-dedup.s

lld/test/MachO/mattrs.ll

[lld-macho] Deduplicate fixed-width literals
ClosedPublic