This is an archive of the discontinued LLVM Phabricator instance.

[opt] Replace the recursive walk for GC with a worklist algorithm.
ClosedPublic

Authored by chandlerc on Jun 26 2015, 9:02 PM.

Download Raw Diff

Details

Reviewers

Commits

rG59013c387e8d: [opt] Replace the recursive walk for GC with a worklist algorithm.
rLLD240995: [opt] Replace the recursive walk for GC with a worklist algorithm.
rL240995: [opt] Replace the recursive walk for GC with a worklist algorithm.

Summary

This flattens the entire liveness walk from a recursive mark approach to
a worklist approach. It also sinks the worklist management completely
out of the SectionChunk and into the Writer by exposing the ability to
iterato over children of a chunk and over the symbol bodies of relocated
symbols. I'm not 100% happy with the API names, so suggestions welcome
there.

This allows us to use a single worklist for the entire recursive walk
and would also be a natural place to take advantage of parallelism at
some future point.

With this, we completely inline away the GC walk into the
Writer::markLive function and it makes it very easy to profile what is
slow. Currently, time is being wasted checking whether a Chunk isa
SectionChunk (it essentially always is), finding (or skipping)
a replacement for a symbol, and chasing pointers between symbols and
their chunks. There are a bunch of things we can do to fix this, and its
easier to do them after this change IMO.

This change alone saves 1-2% of the time for my self-link of lld.exe
(which I'm running and benchmarking on Linux ironically).

Perhaps more notably, we'll no longer blow out the stack for large
links. =]

Just as an FYI, at this point, I/O is starting to really dominate the
profile. Well over 10% of the time appears to be inside the kernel doing
page table silliness. I think a decent chunk of this can be nuked as
well, but it's a little odd as cross-linking in this way isn't really
the primary goal here.

Depends on D10789

Diff Detail

Repository: rL LLVM

Event Timeline

chandlerc updated this revision to Diff 28624.Jun 26 2015, 9:02 PM

chandlerc retitled this revision from to [opt] Replace the recursive walk for GC with a worklist algorithm..

chandlerc updated this object.

chandlerc edited the test plan for this revision. (Show Details)

chandlerc added a reviewer: ruiu.

chandlerc added a parent revision: D10789: [opt] Hoist the call throuh SymbolBody::getReplacement out of the inline method to get a SymbolBody and into the callers, and kill now dead includes..

chandlerc added a subscriber: Unknown Object (MLST).

majnemer added a subscriber: majnemer.Jun 26 2015, 11:39 PM

majnemer added inline comments.

COFF/Writer.cpp
118 ↗	(On Diff #28624)	Realistically, won't we always blow out this SmallVector?

ruiu added inline comments.Jun 27 2015, 12:05 PM

COFF/Chunks.h
13 ↗	(On Diff #28624)	This makes this header and InputFiles.h mutually dependent. Please update InputFiles.h to remove inclusion of this file and add a forward declaration for the class Chunk.
141 ↗	(On Diff #28624)	symbol_iterator -> SymbolIterator?
151–156 ↗	(On Diff #28624)	Move this at beginning of the class definition. Also write operator*() in one line if it fits 80 cols.
COFF/Writer.cpp
118 ↗	(On Diff #28624)	Yeah, this can be much longer than 16. Is SmallVector faster than std::vector for non-small vectors? If not, can we use std::vector instead?
135 ↗	(On Diff #28624)	You don't need to call markLive() when you add a new chunk to the Worklist. Instead you can call that function here.
146–147 ↗	(On Diff #28624)	Associative sections are always SectionChunk. You want to change the type to eliminate this dyn_cast.

silvas added a subscriber: silvas.Jun 29 2015, 11:54 AM

silvas added inline comments.

COFF/Writer.cpp
118 ↗	(On Diff #28624)	Yes. For large POD arrays it can use realloc tricks (e.g. swapping page tables instead of actually copying).

Update fixing several reviewer comments (and rebasing on Rui's commits).

chandlerc added inline comments.Jun 29 2015, 12:01 PM

COFF/Chunks.h
13 ↗	(On Diff #28624)	Yea, the prior change fixed that.
141 ↗	(On Diff #28624)	This is a common convention in LLVM to name iterators based on the STL naming conventions. I'd really like to avoid deviating from that here.
151–156 ↗	(On Diff #28624)	Moved, but whether this fits on one line is up to clang-format. =] It formatted it this way, and I don't want to change it.
COFF/Writer.cpp
118 ↗	(On Diff #28624)	SmallVector is much faster than std::vector in my experience. Also, we can avoid a bunch of early allocations by starting this fairly high. I've raised the number on the stack to 256 so that we get the early churn out of the way without malloc. I could be convinced it should be closer to 4k if you'd like to wait until we're around a page size before we start hitting the system allocator.
135 ↗	(On Diff #28624)	If we fail to call markLive here, then we will add the same SectionChunk to the worklist many times. We're essentually using the bool 'Live' state to ensure we only add a chunk to the worklist once.
146–147 ↗	(On Diff #28624)	Done (reflecting your change of the underlying container type).

LGTM. I'm planning to make a change to the resolver which would conflict with this change, so I'd like to see this being landed before starting creating my patch.

COFF/Symbols.h
130 ↗	(On Diff #28699)	I still want you to rename this class for consistency.
COFF/Writer.cpp
135 ↗	(On Diff #28699)	So you can write SC->markLive() here only once, no? Then you can remove markLive() from all the other places in this function.

This revision is now accepted and ready to land.Jun 29 2015, 1:34 PM

Create the correct patch for this, sorry for the prior patch being garbage.

I'll confirm with you that the worklist logic below is OK, and then land. Thanks!

COFF/Symbols.h
130 ↗	(On Diff #28699)	Sorry, this should not have been part of this patch. It'll be in D10792, and I'll update that review thread shortly.
COFF/Writer.cpp
135 ↗	(On Diff #28699)	No, that is a different algorithm. While we look for the symbols referenced by this section, we may see a symbol more than once. In that case, before we put it into the worklist, we first test that it is not-yet-live, mark it live, and then add it to the worklist. If I make the change you are suggesting, then we would potentially add the same not-yet-live symbol to the worklist many times. This would either require adding a set to the worklist, or make the worklist grow much larger and require an early exit from the processing loop. Neither seem good. Given that we have a bit to track the liveness embedded in the chunk, it seems much better to use that to add them to the worklist exactly once.

Closed by commit rL240995: [opt] Replace the recursive walk for GC with a worklist algorithm. (authored by chandlerc). · Explain WhyJun 29 2015, 2:13 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lld/

trunk/

COFF/

36 lines

17 lines

2 lines

35 lines

Diff 28709

lld/trunk/COFF/Chunks.h

//===- Chunks.h -----------------------------------------------------------===//		//===- Chunks.h -----------------------------------------------------------===//
//		//
// The LLVM Linker		// The LLVM Linker
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLD_COFF_CHUNKS_H		#ifndef LLD_COFF_CHUNKS_H
#define LLD_COFF_CHUNKS_H		#define LLD_COFF_CHUNKS_H

		#include "InputFiles.h"
#include "lld/Core/LLVM.h"		#include "lld/Core/LLVM.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
		#include "llvm/ADT/iterator.h"
#include "llvm/ADT/iterator_range.h"		#include "llvm/ADT/iterator_range.h"
#include "llvm/Object/COFF.h"		#include "llvm/Object/COFF.h"
#include <map>		#include <map>
#include <vector>		#include <vector>

namespace lld {		namespace lld {
namespace coff {		namespace coff {

using llvm::COFF::ImportDirectoryTableEntry;		using llvm::COFF::ImportDirectoryTableEntry;
using llvm::object::COFFSymbolRef;		using llvm::object::COFFSymbolRef;
using llvm::object::SectionRef;		using llvm::object::SectionRef;
using llvm::object::coff_relocation;		using llvm::object::coff_relocation;
using llvm::object::coff_section;		using llvm::object::coff_section;
using llvm::sys::fs::file_magic;		using llvm::sys::fs::file_magic;

class Defined;		class Defined;
class DefinedRegular;		class DefinedRegular;
class DefinedImportData;		class DefinedImportData;
class ObjectFile;		class ObjectFile;
class OutputSection;		class OutputSection;
		class SymbolBody;

// A Chunk represents a chunk of data that will occupy space in the		// A Chunk represents a chunk of data that will occupy space in the
// output (if the resolver chose that). It may or may not be backed by		// output (if the resolver chose that). It may or may not be backed by
// a section of an input file. It could be linker-created data, or		// a section of an input file. It could be linker-created data, or
// doesn't even have actual data (if common or bss).		// doesn't even have actual data (if common or bss).
class Chunk {		class Chunk {
public:		public:
enum Kind { SectionKind, OtherKind };		enum Kind { SectionKind, OtherKind };
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	protected:

// The alignment of this chunk. The writer uses the value.		// The alignment of this chunk. The writer uses the value.
uint32_t Align = 1;		uint32_t Align = 1;
};		};

// A chunk corresponding a section of an input file.		// A chunk corresponding a section of an input file.
class SectionChunk : public Chunk {		class SectionChunk : public Chunk {
public:		public:
		class symbol_iterator : public llvm::iterator_adaptor_base<
		symbol_iterator, const coff_relocation *,
		std::random_access_iterator_tag, SymbolBody *> {
		friend SectionChunk;

		ObjectFile *File;

		symbol_iterator(ObjectFile File, const coff_relocation I)
		: symbol_iterator::iterator_adaptor_base(I), File(File) {}

		public:
		symbol_iterator() = default;

		SymbolBody operator() const {
		return File->getSymbolBody(I->SymbolTableIndex);
		}
		};

SectionChunk(ObjectFile File, const coff_section Header);		SectionChunk(ObjectFile File, const coff_section Header);
static bool classof(const Chunk *C) { return C->kind() == SectionKind; }		static bool classof(const Chunk *C) { return C->kind() == SectionKind; }
size_t getSize() const override { return Header->SizeOfRawData; }		size_t getSize() const override { return Header->SizeOfRawData; }
void writeTo(uint8_t *Buf) override;		void writeTo(uint8_t *Buf) override;
bool hasData() const override;		bool hasData() const override;
uint32_t getPermissions() const override;		uint32_t getPermissions() const override;
StringRef getSectionName() const override { return SectionName; }		StringRef getSectionName() const override { return SectionName; }
void getBaserels(std::vector<uint32_t> Res, Defined ImageBase) override;		void getBaserels(std::vector<uint32_t> Res, Defined ImageBase) override;
bool isCOMDAT() const;		bool isCOMDAT() const;

// Called if the garbage collector decides to not include this chunk		// Called if the garbage collector decides to not include this chunk
// in a final output. It's supposed to print out a log message to stdout.		// in a final output. It's supposed to print out a log message to stdout.
void printDiscardedMessage() const;		void printDiscardedMessage() const;

// Adds COMDAT associative sections to this COMDAT section. A chunk		// Adds COMDAT associative sections to this COMDAT section. A chunk
// and its children are treated as a group by the garbage collector.		// and its children are treated as a group by the garbage collector.
void addAssociative(SectionChunk *Child);		void addAssociative(SectionChunk *Child);

StringRef getDebugName() override;		StringRef getDebugName() override;
void setSymbol(DefinedRegular *S) { if (!Sym) Sym = S; }		void setSymbol(DefinedRegular *S) { if (!Sym) Sym = S; }

// Used by the garbage collector.		// Used by the garbage collector.
bool isRoot() { return Root; }		bool isRoot() { return Root; }
bool isLive() { return Live; }		bool isLive() { return Live; }
void markLive() { if (!Live) mark(); }		void markLive() {
		assert(!Live && "Cannot mark an already live section!");
		Live = true;
		}

		// Allow iteration over the bodies of this chunk's relocated symbols.
		llvm::iterator_range<symbol_iterator> symbols() const {
		return llvm::make_range(symbol_iterator(File, Relocs.begin()),
		symbol_iterator(File, Relocs.end()));
		}

		// Allow iteration over the associated child chunks for this section.
		ArrayRef<SectionChunk *> children() const { return AssocChildren; }

// Used for ICF (Identical COMDAT Folding)		// Used for ICF (Identical COMDAT Folding)
void replaceWith(SectionChunk *Other);		void replaceWith(SectionChunk *Other);
uint64_t getHash() const;		uint64_t getHash() const;
bool equals(const SectionChunk *Other) const;		bool equals(const SectionChunk *Other) const;

// A pointer pointing to a replacement for this chunk.		// A pointer pointing to a replacement for this chunk.
// Initially it points to "this" object. If this chunk is merged		// Initially it points to "this" object. If this chunk is merged
Show All 9 Lines	private:

const coff_section *Header;		const coff_section *Header;
StringRef SectionName;		StringRef SectionName;
std::vector<SectionChunk *> AssocChildren;		std::vector<SectionChunk *> AssocChildren;
llvm::iterator_range<const coff_relocation *> Relocs;		llvm::iterator_range<const coff_relocation *> Relocs;
size_t NumRelocs;		size_t NumRelocs;

// Used by the garbage collector.		// Used by the garbage collector.
void mark();
bool Live = false;		bool Live = false;
bool Root;		bool Root;

// Chunks are basically unnamed chunks of bytes.		// Chunks are basically unnamed chunks of bytes.
// Symbols are associated for debugging and logging purposs only.		// Symbols are associated for debugging and logging purposs only.
DefinedRegular *Sym = nullptr;		DefinedRegular *Sym = nullptr;
};		};

▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

lld/trunk/COFF/Chunks.cpp

Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	for (const coff_relocation &Rel : Relocs) {
case IMAGE_REL_AMD64_SECTION: add16(Off, Out->getSectionIndex()); break;		case IMAGE_REL_AMD64_SECTION: add16(Off, Out->getSectionIndex()); break;
case IMAGE_REL_AMD64_SECREL: add32(Off, S - Out->getRVA()); break;		case IMAGE_REL_AMD64_SECREL: add32(Off, S - Out->getRVA()); break;
default:		default:
llvm::report_fatal_error("Unsupported relocation type");		llvm::report_fatal_error("Unsupported relocation type");
}		}
}		}
}		}

void SectionChunk::mark() {
assert(!Live);
Live = true;

// Mark all symbols listed in the relocation table for this section.
for (const coff_relocation &Rel : Relocs) {
SymbolBody *B = File->getSymbolBody(Rel.SymbolTableIndex)->getReplacement();
if (auto *D = dyn_cast<DefinedRegular>(B))
D->markLive();
}

// Mark associative sections if any.
for (Chunk *C : AssocChildren)
if (auto *SC = dyn_cast<SectionChunk>(C))
SC->markLive();
}

void SectionChunk::addAssociative(SectionChunk *Child) {		void SectionChunk::addAssociative(SectionChunk *Child) {
AssocChildren.push_back(Child);		AssocChildren.push_back(Child);
// Associative sections are live if their parent COMDATs are live,		// Associative sections are live if their parent COMDATs are live,
// and vice versa, so they are not considered live by themselves.		// and vice versa, so they are not considered live by themselves.
Child->Root = false;		Child->Root = false;
}		}

// Windows-specific.		// Windows-specific.
▲ Show 20 Lines • Show All 162 Lines • Show Last 20 Lines

lld/trunk/COFF/Symbols.h

Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	public:
StringRef getName() override;		StringRef getName() override;
uint64_t getRVA() override { return (*Data)->getRVA() + Sym.getValue(); }		uint64_t getRVA() override { return (*Data)->getRVA() + Sym.getValue(); }
bool isExternal() override { return Sym.isExternal(); }		bool isExternal() override { return Sym.isExternal(); }
int compare(SymbolBody *Other) override;		int compare(SymbolBody *Other) override;
std::string getDebugName() override;		std::string getDebugName() override;
bool isCOMDAT() { return IsCOMDAT; }		bool isCOMDAT() { return IsCOMDAT; }
bool isLive() const { return (*Data)->isLive(); }		bool isLive() const { return (*Data)->isLive(); }
void markLive() { (*Data)->markLive(); }		void markLive() { (*Data)->markLive(); }
Chunk getChunk() { return Data; }		SectionChunk getChunk() { return Data; }
uint64_t getValue() { return Sym.getValue(); }		uint64_t getValue() { return Sym.getValue(); }

private:		private:
StringRef Name;		StringRef Name;
ObjectFile *File;		ObjectFile *File;
COFFSymbolRef Sym;		COFFSymbolRef Sym;
SectionChunk **Data;		SectionChunk **Data;
bool IsCOMDAT;		bool IsCOMDAT;
▲ Show 20 Lines • Show All 205 Lines • Show Last 20 Lines

lld/trunk/COFF/Writer.cpp

	Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
	}			}

	// Set live bit on for each reachable chunk. Unmarked (unreachable)			// Set live bit on for each reachable chunk. Unmarked (unreachable)
	// COMDAT chunks will be ignored in the next step, so that they don't			// COMDAT chunks will be ignored in the next step, so that they don't
	// come to the final output file.			// come to the final output file.
	void Writer::markLive() {			void Writer::markLive() {
	if (!Config->DoGC)			if (!Config->DoGC)
	return;			return;

				// We build up a worklist of sections which have been marked as live. We only
				// push into the worklist when we discover an unmarked section, and we mark
				// as we push, so sections never appear twice in the list.
				SmallVector<SectionChunk *, 256> Worklist;

	for (StringRef Name : Config->GCRoots)			for (StringRef Name : Config->GCRoots)
	if (auto *D = dyn_cast<DefinedRegular>(Symtab->find(Name)))			if (auto *D = dyn_cast<DefinedRegular>(Symtab->find(Name)))
				if (!D->isLive()) {
	D->markLive();			D->markLive();
				Worklist.push_back(D->getChunk());
				}
	for (Chunk *C : Symtab->getChunks())			for (Chunk *C : Symtab->getChunks())
	if (auto *SC = dyn_cast<SectionChunk>(C))			if (auto *SC = dyn_cast<SectionChunk>(C))
	if (SC->isRoot())			if (SC->isRoot() && !SC->isLive()) {
	SC->markLive();			SC->markLive();
				Worklist.push_back(SC);
				}

				while (!Worklist.empty()) {
				SectionChunk *SC = Worklist.pop_back_val();
				assert(SC->isLive() && "We mark as live when pushing onto the worklist!");

				// Mark all symbols listed in the relocation table for this section.
				for (SymbolBody *S : SC->symbols())
				if (auto *D = dyn_cast<DefinedRegular>(S->getReplacement()))
				if (!D->isLive()) {
				D->markLive();
				Worklist.push_back(D->getChunk());
				}

				// Mark associative sections if any.
				for (SectionChunk *ChildSC : SC->children())
				if (!ChildSC->isLive()) {
				ChildSC->markLive();
				Worklist.push_back(ChildSC);
				}
				}
	}			}

	// Merge identical COMDAT sections.			// Merge identical COMDAT sections.
	void Writer::dedupCOMDATs() {			void Writer::dedupCOMDATs() {
	if (Config->ICF)			if (Config->ICF)
	doICF(Symtab->getChunks());			doICF(Symtab->getChunks());
	}			}

	▲ Show 20 Lines • Show All 404 Lines • Show Last 20 Lines