This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/trunk/ELF/
-
trunk/
-
ELF/
-
LinkerScript.h
-
LinkerScript.cpp
-
Strings.h
-
Strings.cpp

Differential D26241

[ELF] Speed-up lld up to 5 times by replacing llvm:regex with simple string matcher.
ClosedPublic

Authored by evgeny777 on Nov 2 2016, 6:03 AM.

Download Raw Diff

Details

Reviewers

ruiu
• rafael

Commits

rGdb68845485d8: Use globMatch() instead of llvm::regex in linker scripts
rLLD285895: Use globMatch() instead of llvm::regex in linker scripts
rL285895: Use globMatch() instead of llvm::regex in linker scripts

Summary

This patch reduces linking time of libcxx (libcxx.llvm.org) test suite from 11.1 sec to 2.3 sec on my PC (i7-2600K, 16GB RAM, SSD).
Can provide VSP (Visual Studio performance report) files on request.

Diff Detail

Repository: rL LLVM

Event Timeline

evgeny777 updated this revision to Diff 76696.Nov 2 2016, 6:03 AM

evgeny777 retitled this revision from to [ELF] Speed-up lld up to 5 times by replacing llvm:regex with simple string matcher..

evgeny777 updated this object.

evgeny777 added reviewers: ruiu, • rafael.

evgeny777 set the repository for this revision to rL LLVM.

evgeny777 added a project: lld.

evgeny777 added subscribers: grimar, ikudrin, llvm-commits.

I wonder if that really passes all the tests ?
My concern comes from https://reviews.llvm.org/D23803?vs=on&id=68992&whitespace=ignore-most#toc which was initially posted to implement
[char].
That patch later was converted to use regexps instead of initial implementation. Though implementation in your patch seems does not handle [char]. Don't our tests should catch that ?

Can you point out the test which should fail? I have all them passing (Ubuntu host).
Also it looks like globMatch does support character wildcard (see Strings.cpp:41)

emaste added a subscriber: emaste.Nov 2 2016, 6:34 AM

So if only linker script was switched to use globMatch(), then other places will not fail. Like version-script-complex-wildcards.s.
There is some inconsistency though, may be we might want to switch all places from using regexp after that one ?

Wouldn't it be better in general to handle fixed string prefixes directly in the matcher and only fallback to a more generic matcher if the prefix check passes? Independent of glob or regex, there are three basic situations:

fixed string match --> done after the prefix match
fixed prefix, wildcard rest --> done after the prefix match
rest

I wonder how much of the performance benefit is obtained by just doing this peephole optimisation.

That probably makes sense, because globMatch still consumes about 11% of running time in my case. But I'd do this in separate patch if this one lands.

AntonBikineev added a subscriber: AntonBikineev.Nov 2 2016, 9:43 AM

AntonBikineev added inline comments.

ELF/Strings.h
34 ↗	(On Diff #76696)	Pat is not a forwarding reference in this context, so just move can be used here

I think the problem is that llvm::regex is slow. If you want to optimize something, you might want to do that in llvm::regex instead of doing it on LLD side. For example, if a pattern doesn't contain any meta character, llvm::regex can use a simple string comparison. Or, if a pattern doesn't contain a substring-capturing patterns (e.g. /(.*)/), it can compile the regex into a DFA instead of NFA (I don't know if llvm::regex does this already).

If you want to optimize something, you might want to do that in llvm::regex

AFAIK llvm::regex is NFA (it's actually a rip from OpenBSD). Are you suggesting to implement additional DFA engine for this purpose?

ruiu added inline comments.Nov 2 2016, 11:09 AM

ELF/Strings.h
32–33 ↗	(On Diff #76696)	Add `explicit`.
36 ↗	(On Diff #76696)	Move this definition to Strings.cpp and make globMatch static, so that we can keep globMatch inside a file-scope.
42 ↗	(On Diff #76696)	Make this private.

You need to handle [characters] as George pointed out. Don't we have a test for that?

LGTM. Let's land this and improve in follow-up patches.

This revision is now accepted and ready to land.Nov 2 2016, 4:08 PM

Closed by commit rL285895: Use globMatch() instead of llvm::regex in linker scripts (authored by evgeny777). · Explain WhyNov 3 2016, 4:04 AM

This revision was automatically updated to reflect the committed changes.

evgeny777 mentioned this in D66613: [support][llvm-objcopy] Add support for shell wildcards.Oct 10 2019, 4:36 AM

Revision Contents

Path

Size

lld/

trunk/

ELF/

16 lines

10 lines

12 lines

30 lines

Diff 76834

lld/trunk/ELF/LinkerScript.h

Show All 13 Lines
#include "Strings.h"		#include "Strings.h"
#include "Writer.h"		#include "Writer.h"
#include "lld/Core/LLVM.h"		#include "lld/Core/LLVM.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/DenseSet.h"		#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/Support/Allocator.h"		#include "llvm/Support/Allocator.h"
#include "llvm/Support/MemoryBuffer.h"		#include "llvm/Support/MemoryBuffer.h"
#include "llvm/Support/Regex.h"
#include <functional>		#include <functional>

namespace lld {		namespace lld {
namespace elf {		namespace elf {
class DefinedCommon;		class DefinedCommon;
class ScriptParser;		class ScriptParser;
class SymbolBody;		class SymbolBody;
template <class ELFT> class InputSectionBase;		template <class ELFT> class InputSectionBase;
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	struct OutputSectionCommand : BaseCommand {
std::vector<uint8_t> Filler;		std::vector<uint8_t> Filler;
ConstraintKind Constraint = ConstraintKind::NoConstraint;		ConstraintKind Constraint = ConstraintKind::NoConstraint;
};		};

// This struct represents one section match pattern in SECTIONS() command.		// This struct represents one section match pattern in SECTIONS() command.
// It can optionally have negative match pattern for EXCLUDED_FILE command.		// It can optionally have negative match pattern for EXCLUDED_FILE command.
// Also it may be surrounded with SORT() command, so contains sorting rules.		// Also it may be surrounded with SORT() command, so contains sorting rules.
struct SectionPattern {		struct SectionPattern {
SectionPattern(llvm::Regex &&Re1, llvm::Regex &&Re2)		SectionPattern(StringMatcher &&Re1, StringMatcher &&Re2)
: ExcludedFileRe(std::forward<llvm::Regex>(Re1)),		: ExcludedFileRe(std::forward<StringMatcher>(Re1)),
SectionRe(std::forward<llvm::Regex>(Re2)) {}		SectionRe(std::forward<StringMatcher>(Re2)) {}

SectionPattern(SectionPattern &&Other) {		SectionPattern(SectionPattern &&Other) {
std::swap(ExcludedFileRe, Other.ExcludedFileRe);		std::swap(ExcludedFileRe, Other.ExcludedFileRe);
std::swap(SectionRe, Other.SectionRe);		std::swap(SectionRe, Other.SectionRe);
std::swap(SortOuter, Other.SortOuter);		std::swap(SortOuter, Other.SortOuter);
std::swap(SortInner, Other.SortInner);		std::swap(SortInner, Other.SortInner);
}		}

llvm::Regex ExcludedFileRe;		StringMatcher ExcludedFileRe;
llvm::Regex SectionRe;		StringMatcher SectionRe;
SortSectionPolicy SortOuter;		SortSectionPolicy SortOuter;
SortSectionPolicy SortInner;		SortSectionPolicy SortInner;
};		};

struct InputSectionDescription : BaseCommand {		struct InputSectionDescription : BaseCommand {
InputSectionDescription(StringRef FilePattern)		InputSectionDescription(StringRef FilePattern)
: BaseCommand(InputSectionKind),		: BaseCommand(InputSectionKind), FileRe(FilePattern) {}
FileRe(compileGlobPatterns({FilePattern})) {}
static bool classof(const BaseCommand *C);		static bool classof(const BaseCommand *C);
llvm::Regex FileRe;		StringMatcher FileRe;

// Input sections that matches at least one of SectionPatterns		// Input sections that matches at least one of SectionPatterns
// will be associated with this InputSectionDescription.		// will be associated with this InputSectionDescription.
std::vector<SectionPattern> SectionPatterns;		std::vector<SectionPattern> SectionPatterns;

std::vector<InputSectionData *> Sections;		std::vector<InputSectionData *> Sections;
};		};

▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

lld/trunk/ELF/LinkerScript.cpp

Show First 20 Lines • Show All 930 Lines • ▼ Show 20 Lines	private:

SymbolAssignment *readAssignment(StringRef Name);		SymbolAssignment *readAssignment(StringRef Name);
BytesDataCommand *readBytesDataCommand(StringRef Tok);		BytesDataCommand *readBytesDataCommand(StringRef Tok);
std::vector<uint8_t> readFill();		std::vector<uint8_t> readFill();
OutputSectionCommand *readOutputSectionDescription(StringRef OutSec);		OutputSectionCommand *readOutputSectionDescription(StringRef OutSec);
std::vector<uint8_t> readOutputSectionFiller(StringRef Tok);		std::vector<uint8_t> readOutputSectionFiller(StringRef Tok);
std::vector<StringRef> readOutputSectionPhdrs();		std::vector<StringRef> readOutputSectionPhdrs();
InputSectionDescription *readInputSectionDescription(StringRef Tok);		InputSectionDescription *readInputSectionDescription(StringRef Tok);
Regex readFilePatterns();		StringMatcher readFilePatterns();
std::vector<SectionPattern> readInputSectionsList();		std::vector<SectionPattern> readInputSectionsList();
InputSectionDescription *readInputSectionRules(StringRef FilePattern);		InputSectionDescription *readInputSectionRules(StringRef FilePattern);
unsigned readPhdrType();		unsigned readPhdrType();
SortSectionPolicy readSortKind();		SortSectionPolicy readSortKind();
SymbolAssignment *readProvideHidden(bool Provide, bool Hidden);		SymbolAssignment *readProvideHidden(bool Provide, bool Hidden);
SymbolAssignment *readProvideOrAssignment(StringRef Tok);		SymbolAssignment *readProvideOrAssignment(StringRef Tok);
void readSort();		void readSort();
Expr readAssert();		Expr readAssert();
▲ Show 20 Lines • Show All 254 Lines • ▼ Show 20 Lines	return StringSwitch<int>(Op)
.Cases("*", "/", 5)		.Cases("*", "/", 5)
.Cases("+", "-", 4)		.Cases("+", "-", 4)
.Cases("<<", ">>", 3)		.Cases("<<", ">>", 3)
.Cases("<", "<=", ">", ">=", "==", "!=", 2)		.Cases("<", "<=", ">", ">=", "==", "!=", 2)
.Cases("&", "\|", 1)		.Cases("&", "\|", 1)
.Default(-1);		.Default(-1);
}		}

Regex ScriptParser::readFilePatterns() {		StringMatcher ScriptParser::readFilePatterns() {
std::vector<StringRef> V;		std::vector<StringRef> V;
while (!Error && !consume(")"))		while (!Error && !consume(")"))
V.push_back(next());		V.push_back(next());
return compileGlobPatterns(V);		return StringMatcher(std::move(V));
}		}

SortSectionPolicy ScriptParser::readSortKind() {		SortSectionPolicy ScriptParser::readSortKind() {
if (consume("SORT") \|\| consume("SORT_BY_NAME"))		if (consume("SORT") \|\| consume("SORT_BY_NAME"))
return SortSectionPolicy::Name;		return SortSectionPolicy::Name;
if (consume("SORT_BY_ALIGNMENT"))		if (consume("SORT_BY_ALIGNMENT"))
return SortSectionPolicy::Alignment;		return SortSectionPolicy::Alignment;
if (consume("SORT_BY_INIT_PRIORITY"))		if (consume("SORT_BY_INIT_PRIORITY"))
return SortSectionPolicy::Priority;		return SortSectionPolicy::Priority;
if (consume("SORT_NONE"))		if (consume("SORT_NONE"))
return SortSectionPolicy::None;		return SortSectionPolicy::None;
return SortSectionPolicy::Default;		return SortSectionPolicy::Default;
}		}

// Method reads a list of sequence of excluded files and section globs given in		// Method reads a list of sequence of excluded files and section globs given in
// a following form: ((EXCLUDE_FILE(file_pattern+))? section_pattern+)+		// a following form: ((EXCLUDE_FILE(file_pattern+))? section_pattern+)+
// Example: (.foo.1 EXCLUDE_FILE (a.o) .foo.2 EXCLUDE_FILE (*b.o) .foo.3)		// Example: (.foo.1 EXCLUDE_FILE (a.o) .foo.2 EXCLUDE_FILE (*b.o) .foo.3)
// The semantics of that is next:		// The semantics of that is next:
// * Include .foo.1 from every file.		// * Include .foo.1 from every file.
// * Include .foo.2 from every file but a.o		// * Include .foo.2 from every file but a.o
// * Include .foo.3 from every file but b.o		// * Include .foo.3 from every file but b.o
std::vector<SectionPattern> ScriptParser::readInputSectionsList() {		std::vector<SectionPattern> ScriptParser::readInputSectionsList() {
std::vector<SectionPattern> Ret;		std::vector<SectionPattern> Ret;
while (!Error && peek() != ")") {		while (!Error && peek() != ")") {
Regex ExcludeFileRe;		StringMatcher ExcludeFileRe;
if (consume("EXCLUDE_FILE")) {		if (consume("EXCLUDE_FILE")) {
expect("(");		expect("(");
ExcludeFileRe = readFilePatterns();		ExcludeFileRe = readFilePatterns();
}		}

std::vector<StringRef> V;		std::vector<StringRef> V;
while (!Error && peek() != ")" && peek() != "EXCLUDE_FILE")		while (!Error && peek() != ")" && peek() != "EXCLUDE_FILE")
V.push_back(next());		V.push_back(next());

if (!V.empty())		if (!V.empty())
Ret.push_back({std::move(ExcludeFileRe), compileGlobPatterns(V)});		Ret.push_back({std::move(ExcludeFileRe), StringMatcher(std::move(V))});
else		else
setError("section pattern is expected");		setError("section pattern is expected");
}		}
return Ret;		return Ret;
}		}

// Section pattern grammar can have complex expressions, for example:		// Section pattern grammar can have complex expressions, for example:
// (SORT(.foo. EXCLUDE_FILE (file1.o) .bar.) .bar.* SORT(.zed.*))		// (SORT(.foo. EXCLUDE_FILE (file1.o) .bar.) .bar.* SORT(.zed.*))
▲ Show 20 Lines • Show All 587 Lines • Show Last 20 Lines

lld/trunk/ELF/Strings.h

	Show All 19 Lines
	namespace elf {			namespace elf {
	llvm::Regex compileGlobPatterns(ArrayRef<StringRef> V);			llvm::Regex compileGlobPatterns(ArrayRef<StringRef> V);
	int getPriority(StringRef S);			int getPriority(StringRef S);
	bool hasWildcard(StringRef S);			bool hasWildcard(StringRef S);
	std::vector<uint8_t> parseHex(StringRef S);			std::vector<uint8_t> parseHex(StringRef S);
	bool isValidCIdentifier(StringRef S);			bool isValidCIdentifier(StringRef S);
	StringRef unquote(StringRef S);			StringRef unquote(StringRef S);

				class StringMatcher {
				public:
				StringMatcher() = default;
				explicit StringMatcher(StringRef P) : Patterns({P}) {}
				explicit StringMatcher(std::vector<StringRef> &&Pat)
				: Patterns(std::move(Pat)) {}

				bool match(StringRef S);
				private:
				std::vector<StringRef> Patterns;
				};

	// Returns a demangled C++ symbol name. If Name is not a mangled			// Returns a demangled C++ symbol name. If Name is not a mangled
	// name or the system does not provide __cxa_demangle function,			// name or the system does not provide __cxa_demangle function,
	// it returns an unmodified string.			// it returns an unmodified string.
	std::string demangle(StringRef Name);			std::string demangle(StringRef Name);

	// Demangle if Config->Demangle is true.			// Demangle if Config->Demangle is true.
	std::string maybeDemangle(StringRef Name);			std::string maybeDemangle(StringRef Name);

	inline StringRef toStringRef(ArrayRef<uint8_t> Arr) {			inline StringRef toStringRef(ArrayRef<uint8_t> Arr) {
	return {(const char *)Arr.data(), Arr.size()};			return {(const char *)Arr.data(), Arr.size()};
	}			}
	}			}
	}			}

	#endif			#endif

lld/trunk/ELF/Strings.cpp

	Show All 15 Lines
	#include "llvm/Config/config.h"			#include "llvm/Config/config.h"
	#include "llvm/Demangle/Demangle.h"			#include "llvm/Demangle/Demangle.h"
	#include <algorithm>			#include <algorithm>

	using namespace llvm;			using namespace llvm;
	using namespace lld;			using namespace lld;
	using namespace lld::elf;			using namespace lld::elf;

				// Returns true if S matches T. S can contain glob meta-characters.
				// The asterisk ('*') matches zero or more characters, and the question
				// mark ('?') matches one character.
				static bool globMatch(StringRef S, StringRef T) {
				for (;;) {
				if (S.empty())
				return T.empty();
				if (S[0] == '*') {
				S = S.substr(1);
				if (S.empty())
				// Fast path. If a pattern is '*', it matches anything.
				return true;
				for (size_t I = 0, E = T.size(); I < E; ++I)
				if (globMatch(S, T.substr(I)))
				return true;
				return false;
				}
				if (T.empty() \|\| (S[0] != T[0] && S[0] != '?'))
				return false;
				S = S.substr(1);
				T = T.substr(1);
				}
				}

				bool StringMatcher::match(StringRef S) {
				for (StringRef P : Patterns)
				if (globMatch(P, S))
				return true;
				return false;
				}
	// If an input string is in the form of "foo.N" where N is a number,			// If an input string is in the form of "foo.N" where N is a number,
	// return N. Otherwise, returns 65536, which is one greater than the			// return N. Otherwise, returns 65536, which is one greater than the
	// lowest priority.			// lowest priority.
	int elf::getPriority(StringRef S) {			int elf::getPriority(StringRef S) {
	size_t Pos = S.rfind('.');			size_t Pos = S.rfind('.');
	if (Pos == StringRef::npos)			if (Pos == StringRef::npos)
	return 65536;			return 65536;
	int V;			int V;
	▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines