This patch reduces linking time of libcxx (libcxx.llvm.org) test suite from 11.1 sec to 2.3 sec on my PC (i7-2600K, 16GB RAM, SSD).
Can provide VSP (Visual Studio performance report) files on request.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
I wonder if that really passes all the tests ?
My concern comes from https://reviews.llvm.org/D23803?vs=on&id=68992&whitespace=ignore-most#toc which was initially posted to implement
[char].
That patch later was converted to use regexps instead of initial implementation. Though implementation in your patch seems does not handle [char]. Don't our tests should catch that ?
Can you point out the test which should fail? I have all them passing (Ubuntu host).
Also it looks like globMatch does support character wildcard (see Strings.cpp:41)
So if only linker script was switched to use globMatch(), then other places will not fail. Like version-script-complex-wildcards.s.
There is some inconsistency though, may be we might want to switch all places from using regexp after that one ?
Wouldn't it be better in general to handle fixed string prefixes directly in the matcher and only fallback to a more generic matcher if the prefix check passes? Independent of glob or regex, there are three basic situations:
- fixed string match --> done after the prefix match
- fixed prefix, wildcard rest --> done after the prefix match
- rest
I wonder how much of the performance benefit is obtained by just doing this peephole optimisation.
That probably makes sense, because globMatch still consumes about 11% of running time in my case. But I'd do this in separate patch if this one lands.
ELF/Strings.h | ||
---|---|---|
34 ↗ | (On Diff #76696) | Pat is not a forwarding reference in this context, so just move can be used here |
I think the problem is that llvm::regex is slow. If you want to optimize something, you might want to do that in llvm::regex instead of doing it on LLD side. For example, if a pattern doesn't contain any meta character, llvm::regex can use a simple string comparison. Or, if a pattern doesn't contain a substring-capturing patterns (e.g. /(.*)/), it can compile the regex into a DFA instead of NFA (I don't know if llvm::regex does this already).
If you want to optimize something, you might want to do that in llvm::regex
AFAIK llvm::regex is NFA (it's actually a rip from OpenBSD). Are you suggesting to implement additional DFA engine for this purpose?
You need to handle [characters] as George pointed out. Don't we have a test for that?