This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/MachO/
-
MachO/
-
InputFiles.h
-
InputFiles.cpp
-
InputSection.h
-
InputSection.cpp
-
Writer.cpp

Differential D79153

[lld-macho] Avoid unnecessary preprocessing of relocations
AbandonedPublic

Authored by int3 on Apr 29 2020, 9:53 PM.

Download Raw Diff

Details

Reviewers

ruiu
pcc
MaskRay
smeenai
alexander-shaposhnikov
gkm
Ktwu
christylee

Summary

Previously, we parsed them into a Reloc struct when reading the input,
but that's unnecessary. Handling them at output time is closer to what
lld-ELF is doing, and should make future parallelization work easier.

Depends on D79114.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

int3 created this revision.Apr 29 2020, 9:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2020, 9:53 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

update

Hmm. I thought part of the reason the original prototype parsed them early was to handle subsections via symbols (where you may need to adjust relocations based on subsection splitting). Do you have a sense of how this would play with that?

If I understand our current setup, we're parsing the relocations early but only resolving them (as in writing out their target) at the end, so that end part should still represent opportunities for parallelism, I think. (We should also figure out exactly what it is that COFF and ELF parallelize, and check if our design can handle parallelizing those as well.)

Harbormaster failed remote builds in B55248: Diff 261125!Apr 29 2020, 10:41 PM

Harbormaster failed remote builds in B55249: Diff 261126!

I'm still a bit hazy on how subsections and their relocations will end up looking like, but I don't see why we might need to adjust relocations at load time instead of at output time. From what I understand, we basically need to figure out which subsection a given relocation's address points into. Presumably subsections will keep track of their original address ranges in the input file, so we can do the relocation -> subsection mapping at output time too. Well, I should probably have a look at how ld64 handles subsections...

I *think* what ld64 does is to translate the raw relocation structures into "fixups" that target specific atoms / subsections. But it's still not clear to me that we can't do the relocation -> subsection mapping at output time. Moreover, given the current state of the implementation, I don't think having a separate Reloc struct is super useful -- all we're really doing is doing a 1:1 copy of various field values, plus the symbol/section resolution, and the latter can definitely be done at output time.

The architecture may need to be different here IMHO because of subsections. Don't forget that you need to map relocations onto subsections in order to implement gc-sections, and depending on the number of subsections you have per section, that could get expensive without an intermediate data structure. On top of that you'd still need the O(M log N) at output time. To me it seemed better to pay the O(M log N) up front once and avoid the cost at gc-sections time.

Oh, that makes sense, I hadn't thought about gc-sections. Thanks for the insight @pcc!

int3 mentioned this in D79211: [lld-macho] Support pc-relative section relocations.Apr 30 2020, 5:23 PM

Revision Contents

Path

Size

lld/

MachO/

3 lines

39 lines

13 lines

47 lines

16 lines

Diff 261126

lld/MachO/InputFiles.h

Show All 37 Lines	public:
std::vector<Symbol *> symbols;		std::vector<Symbol *> symbols;
std::vector<InputSection *> sections;		std::vector<InputSection *> sections;

protected:		protected:
InputFile(Kind kind, MemoryBufferRef mb) : mb(mb), fileKind(kind) {}		InputFile(Kind kind, MemoryBufferRef mb) : mb(mb), fileKind(kind) {}

std::vector<InputSection *> parseSections(ArrayRef<llvm::MachO::section_64>);		std::vector<InputSection *> parseSections(ArrayRef<llvm::MachO::section_64>);

void parseRelocations(const llvm::MachO::section_64 &,
std::vector<Reloc> &relocs);

private:		private:
const Kind fileKind;		const Kind fileKind;
};		};

// .o file		// .o file
class ObjFile : public InputFile {		class ObjFile : public InputFile {
public:		public:
explicit ObjFile(MemoryBufferRef mb);		explicit ObjFile(MemoryBufferRef mb);
Show All 28 Lines

lld/MachO/InputFiles.cpp

Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	for (const section_64 &sec : sections) {
isec->segname = StringRef(sec.segname, strnlen(sec.segname, 16));		isec->segname = StringRef(sec.segname, strnlen(sec.segname, 16));
isec->data = {buf + sec.offset, static_cast<size_t>(sec.size)};		isec->data = {buf + sec.offset, static_cast<size_t>(sec.size)};
if (sec.align >= 32)		if (sec.align >= 32)
error("alignment " + std::to_string(sec.align) + " of section " +		error("alignment " + std::to_string(sec.align) + " of section " +
isec->name + " is too large");		isec->name + " is too large");
else		else
isec->align = 1 << sec.align;		isec->align = 1 << sec.align;
isec->flags = sec.flags;		isec->flags = sec.flags;
		isec->relOff = sec.reloff;
		isec->nReloc = sec.nreloc;
ret.push_back(isec);		ret.push_back(isec);
}		}

return ret;		return ret;
}		}

void InputFile::parseRelocations(const section_64 &sec,
std::vector<Reloc> &relocs) {
auto buf = reinterpret_cast<const uint8_t >(mb.getBufferStart());
ArrayRef<any_relocation_info> relInfos(
reinterpret_cast<const any_relocation_info *>(buf + sec.reloff),
sec.nreloc);

for (const any_relocation_info &anyRel : relInfos) {
Reloc r;
if (anyRel.r_word0 & R_SCATTERED) {
error("TODO: Scattered relocations not supported");
} else {
auto rel = reinterpret_cast<const relocation_info &>(anyRel);
r.type = rel.r_type;
r.offset = rel.r_address;
r.addend = target->getImplicitAddend(buf + sec.offset + r.offset, r.type);
if (rel.r_extern)
r.target = symbols[rel.r_symbolnum];
else {
error("TODO: Non-extern relocations are not supported");
continue;
}
}
relocs.push_back(r);
}
}

ObjFile::ObjFile(MemoryBufferRef mb) : InputFile(ObjKind, mb) {		ObjFile::ObjFile(MemoryBufferRef mb) : InputFile(ObjKind, mb) {
auto buf = reinterpret_cast<const uint8_t >(mb.getBufferStart());		auto buf = reinterpret_cast<const uint8_t >(mb.getBufferStart());
auto hdr = reinterpret_cast<const mach_header_64 >(mb.getBufferStart());		auto hdr = reinterpret_cast<const mach_header_64 >(mb.getBufferStart());
ArrayRef<section_64> objSections;		ArrayRef<section_64> objSections;

if (const load_command *cmd = findCommand(hdr, LC_SEGMENT_64)) {		if (const load_command *cmd = findCommand(hdr, LC_SEGMENT_64)) {
auto c = reinterpret_cast<const segment_command_64 >(cmd);		auto c = reinterpret_cast<const segment_command_64 >(cmd);
objSections = ArrayRef<section_64>{		objSections = ArrayRef<section_64>{
Show All 28 Lines	for (const nlist_64 &sym : nList) {
symbols.push_back(symtab->addDefined(name, isec, value));		symbols.push_back(symtab->addDefined(name, isec, value));
continue;		continue;
}		}

// Local defined symbol		// Local defined symbol
symbols.push_back(make<Defined>(name, isec, value));		symbols.push_back(make<Defined>(name, isec, value));
}		}
}		}

// The relocations may refer to the symbols, so we parse them after we have
// the symbols loaded.
if (!sections.empty()) {
auto it = sections.begin();
for (const section_64 &sec : objSections) {
parseRelocations(sec, (*it)->relocs);
++it;
}
}
}		}

DylibFile::DylibFile(MemoryBufferRef mb) : InputFile(DylibKind, mb) {		DylibFile::DylibFile(MemoryBufferRef mb) : InputFile(DylibKind, mb) {
auto buf = reinterpret_cast<const uint8_t >(mb.getBufferStart());		auto buf = reinterpret_cast<const uint8_t >(mb.getBufferStart());
auto hdr = reinterpret_cast<const mach_header_64 >(mb.getBufferStart());		auto hdr = reinterpret_cast<const mach_header_64 >(mb.getBufferStart());

// Initialize dylibName.		// Initialize dylibName.
if (const load_command *cmd = findCommand(hdr, LC_ID_DYLIB)) {		if (const load_command *cmd = findCommand(hdr, LC_ID_DYLIB)) {
Show All 40 Lines

lld/MachO/InputSection.h

	Show All 16 Lines
	namespace lld {			namespace lld {
	namespace macho {			namespace macho {

	class InputFile;			class InputFile;
	class InputSection;			class InputSection;
	class OutputSegment;			class OutputSegment;
	class Symbol;			class Symbol;

	struct Reloc {
	uint8_t type;
	uint32_t addend;
	uint32_t offset;
	llvm::PointerUnion<Symbol , InputSection > target;
	};

	class InputSection {			class InputSection {
	public:			public:
	virtual ~InputSection() = default;			virtual ~InputSection() = default;
	virtual size_t getSize() const { return data.size(); }			virtual size_t getSize() const { return data.size(); }
	virtual uint64_t getFileSize() const { return getSize(); }			virtual uint64_t getFileSize() const { return getSize(); }
	uint64_t getFileOffset() const;			uint64_t getFileOffset() const;
	// Don't emit section_64 headers for hidden sections.			// Don't emit section_64 headers for hidden sections.
	virtual bool isHidden() const { return false; }			virtual bool isHidden() const { return false; }
	// Unneeded sections are omitted entirely (header and body).			// Unneeded sections are omitted entirely (header and body).
	virtual bool isNeeded() const { return true; }			virtual bool isNeeded() const { return true; }
	virtual void writeTo(uint8_t *buf);			virtual void writeTo(uint8_t *buf);
				ArrayRef<llvm::MachO::any_relocation_info> relocations() const;

	InputFile *file = nullptr;			InputFile *file = nullptr;
	OutputSegment *parent = nullptr;			OutputSegment *parent = nullptr;
	StringRef name;			StringRef name;
	StringRef segname;			StringRef segname;

	ArrayRef<uint8_t> data;			ArrayRef<uint8_t> data;

	// TODO these properties ought to live in an OutputSection class.			// TODO these properties ought to live in an OutputSection class.
	// Move them once available.			// Move them once available.
	uint64_t addr = 0;			uint64_t addr = 0;
	uint32_t align = 1;			uint32_t align = 1;
	uint32_t sectionIndex = 0;			uint32_t sectionIndex = 0;
	uint32_t flags = 0;			uint32_t flags = 0;

	std::vector<Reloc> relocs;			// Relocation data offset in the input file.
				uint32_t relOff = 0;
				// Number of relocation entries.
				uint32_t nReloc = 0;
	};			};

	extern std::vector<InputSection *> inputSections;			extern std::vector<InputSection *> inputSections;

	} // namespace macho			} // namespace macho
	} // namespace lld			} // namespace lld

	#endif			#endif

lld/MachO/InputSection.cpp

	//===- InputSection.cpp ---------------------------------------------------===//			//===- InputSection.cpp ---------------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "InputSection.h"			#include "InputSection.h"
				#include "InputFiles.h"
	#include "OutputSegment.h"			#include "OutputSegment.h"
				#include "SymbolTable.h"
	#include "Symbols.h"			#include "Symbols.h"
	#include "Target.h"			#include "Target.h"

				#include "lld/Common/ErrorHandler.h"
	#include "lld/Common/Memory.h"			#include "lld/Common/Memory.h"
	#include "llvm/Support/Endian.h"			#include "llvm/Support/Endian.h"

	using namespace llvm::MachO;			using namespace llvm::MachO;
	using namespace llvm::support;			using namespace llvm::support;
	using namespace lld;			using namespace lld;
	using namespace lld::macho;			using namespace lld::macho;

	std::vector<InputSection *> macho::inputSections;			std::vector<InputSection *> macho::inputSections;

	uint64_t InputSection::getFileOffset() const {			uint64_t InputSection::getFileOffset() const {
	return parent->fileOff + addr - parent->firstSection()->addr;			return parent->fileOff + addr - parent->firstSection()->addr;
	}			}

	void InputSection::writeTo(uint8_t *buf) {			void InputSection::writeTo(uint8_t *buf) {
	if (!data.empty())			if (!data.empty())
	memcpy(buf, data.data(), data.size());			memcpy(buf, data.data(), data.size());

	for (Reloc &r : relocs) {			for (const any_relocation_info &anyRel : relocations()) {
				if (anyRel.r_word0 & R_SCATTERED)
				fatal("TODO: Scattered relocations not supported");

				auto rel = reinterpret_cast<const relocation_info &>(anyRel);
				uint32_t addend =
				target->getImplicitAddend(buf + rel.r_address, rel.r_type);
	uint64_t va = 0;			uint64_t va = 0;
	if (auto s = r.target.dyn_cast<Symbol >()) {			if (rel.r_extern) {
	if (auto *dylibSymbol = dyn_cast<DylibSymbol>(s)) {			Symbol *s = file->symbols[rel.r_symbolnum];
	va = target->getDylibSymbolVA(*dylibSymbol, r.type);			if (auto *dylibSymbol = dyn_cast<DylibSymbol>(s))
	} else {			va = target->getDylibSymbolVA(*dylibSymbol, rel.r_type);
				else
	va = s->getVA();			va = s->getVA();
	}
	} else if (auto isec = r.target.dyn_cast<InputSection >()) {
	va = isec->addr;
	} else {			} else {
	llvm_unreachable("Unknown relocation target");			fatal("TODO: Non-extern relocations are not supported");
	}			}

	uint64_t val = va + r.addend;			uint64_t val = va + addend;
	if (1) // TODO: handle non-pcrel relocations			if (rel.r_pcrel)
	val -= addr + r.offset;			val -= addr + rel.r_address;
	target->relocateOne(buf + r.offset, r.type, val);			else
				fatal("TODO: handle non-pcrel relocations");

				target->relocateOne(buf + rel.r_address, rel.r_type, val);
				}
	}			}

				ArrayRef<llvm::MachO::any_relocation_info> InputSection::relocations() const {
				// Synthetic sections will not have a corresponding InputFile.
				if (file == nullptr)
				return {};

				return {reinterpret_cast<const any_relocation_info *>(
				file->mb.getBufferStart() + relOff),
				nReloc};
	}			}

lld/MachO/Writer.cpp

	Show First 20 Lines • Show All 333 Lines • ▼ Show 20 Lines
	template <typename SectionType, typename... ArgT>			template <typename SectionType, typename... ArgT>
	SectionType *createInputSection(ArgT &&... args) {			SectionType *createInputSection(ArgT &&... args) {
	auto *section = make<SectionType>(std::forward<ArgT>(args)...);			auto *section = make<SectionType>(std::forward<ArgT>(args)...);
	inputSections.push_back(section);			inputSections.push_back(section);
	return section;			return section;
	}			}

	void Writer::scanRelocations() {			void Writer::scanRelocations() {
	for (InputSection *sect : inputSections)			for (InputSection *isec : inputSections) {
	for (Reloc &r : sect->relocs)			for (const any_relocation_info &anyRel : isec->relocations()) {
	if (auto s = r.target.dyn_cast<Symbol >())			if (anyRel.r_word0 & R_SCATTERED)
				fatal("TODO: Scattered relocations not supported");

				auto rel = reinterpret_cast<const relocation_info &>(anyRel);
				if (rel.r_extern) {
				Symbol *s = isec->file->symbols[rel.r_symbolnum];
	if (auto *dylibSymbol = dyn_cast<DylibSymbol>(s))			if (auto *dylibSymbol = dyn_cast<DylibSymbol>(s))
	target->prepareDylibSymbolRelocation(*dylibSymbol, r.type);			target->prepareDylibSymbolRelocation(*dylibSymbol, rel.r_type);
				}
				}
				}
	}			}

	void Writer::createLoadCommands() {			void Writer::createLoadCommands() {
	headerSection->addLoadCommand(			headerSection->addLoadCommand(
	make<LCDyldInfo>(bindingSection, lazyBindingSection, exportSection));			make<LCDyldInfo>(bindingSection, lazyBindingSection, exportSection));
	headerSection->addLoadCommand(			headerSection->addLoadCommand(
	make<LCSymtab>(symtabSection, stringTableSection));			make<LCSymtab>(symtabSection, stringTableSection));
	headerSection->addLoadCommand(make<LCDysymtab>());			headerSection->addLoadCommand(make<LCDysymtab>());
	▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines