This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Speed-up lld up to 5 times by replacing llvm:regex with simple string matcher.
ClosedPublic

Authored by evgeny777 on Nov 2 2016, 6:03 AM.

Details

Summary

This patch reduces linking time of libcxx (libcxx.llvm.org) test suite from 11.1 sec to 2.3 sec on my PC (i7-2600K, 16GB RAM, SSD).
Can provide VSP (Visual Studio performance report) files on request.

Diff Detail

Repository
rL LLVM

Event Timeline

evgeny777 updated this revision to Diff 76696.Nov 2 2016, 6:03 AM
evgeny777 retitled this revision from to [ELF] Speed-up lld up to 5 times by replacing llvm:regex with simple string matcher..
evgeny777 updated this object.
evgeny777 added reviewers: ruiu, rafael.
evgeny777 set the repository for this revision to rL LLVM.
evgeny777 added a project: lld.
evgeny777 added subscribers: grimar, ikudrin, llvm-commits.
grimar added a comment.Nov 2 2016, 6:24 AM

I wonder if that really passes all the tests ?
My concern comes from https://reviews.llvm.org/D23803?vs=on&id=68992&whitespace=ignore-most#toc which was initially posted to implement
[char].
That patch later was converted to use regexps instead of initial implementation. Though implementation in your patch seems does not handle [char]. Don't our tests should catch that ?

Can you point out the test which should fail? I have all them passing (Ubuntu host).
Also it looks like globMatch does support character wildcard (see Strings.cpp:41)

emaste added a subscriber: emaste.Nov 2 2016, 6:34 AM
grimar added a comment.EditedNov 2 2016, 6:45 AM

So if only linker script was switched to use globMatch(), then other places will not fail. Like version-script-complex-wildcards.s.
There is some inconsistency though, may be we might want to switch all places from using regexp after that one ?

joerg added a subscriber: joerg.Nov 2 2016, 7:45 AM

Wouldn't it be better in general to handle fixed string prefixes directly in the matcher and only fallback to a more generic matcher if the prefix check passes? Independent of glob or regex, there are three basic situations:

  • fixed string match --> done after the prefix match
  • fixed prefix, wildcard rest --> done after the prefix match
  • rest

I wonder how much of the performance benefit is obtained by just doing this peephole optimisation.

That probably makes sense, because globMatch still consumes about 11% of running time in my case. But I'd do this in separate patch if this one lands.

AntonBikineev added inline comments.
ELF/Strings.h
34 ↗(On Diff #76696)

Pat is not a forwarding reference in this context, so just move can be used here

ruiu edited edge metadata.Nov 2 2016, 9:44 AM

I think the problem is that llvm::regex is slow. If you want to optimize something, you might want to do that in llvm::regex instead of doing it on LLD side. For example, if a pattern doesn't contain any meta character, llvm::regex can use a simple string comparison. Or, if a pattern doesn't contain a substring-capturing patterns (e.g. /(.*)/), it can compile the regex into a DFA instead of NFA (I don't know if llvm::regex does this already).

If you want to optimize something, you might want to do that in llvm::regex

AFAIK llvm::regex is NFA (it's actually a rip from OpenBSD). Are you suggesting to implement additional DFA engine for this purpose?

ruiu added inline comments.Nov 2 2016, 11:09 AM
ELF/Strings.h
32–33 ↗(On Diff #76696)

Add explicit.

36 ↗(On Diff #76696)

Move this definition to Strings.cpp and make globMatch static, so that we can keep globMatch inside a file-scope.

42 ↗(On Diff #76696)

Make this private.

ruiu added a comment.Nov 2 2016, 11:13 AM

You need to handle [characters] as George pointed out. Don't we have a test for that?

ruiu accepted this revision.Nov 2 2016, 4:08 PM
ruiu edited edge metadata.

LGTM. Let's land this and improve in follow-up patches.

This revision is now accepted and ready to land.Nov 2 2016, 4:08 PM
This revision was automatically updated to reflect the committed changes.