This is an archive of the discontinued LLVM Phabricator instance.

Add preload method.
Needs ReviewPublic

Authored by ruiu on Sep 12 2014, 4:00 PM.

Details

Summary

This is a patch to discuss performance improvements, and not for
ready for submittion. It doesn't compile.

Problem:

Currently LLD parses .o files given via the command line in parallel
(see Driver.cpp). That's better than nothing, but not good enough to
use as much available cores as possible, because .a file parsing is
paralellized.

When the resolver finds a symbol in an archive file, it extracts the
file from the archive, parses it, add its symbols to the symbol table,
and then continue. It's a serial process.

Solution:

Add preload() member function to InputGraph to let it start a (light-
weight) task in background to parse a file. With this we would
have multiple tasks in background. The effect of preload() is that
it may shorten the response time of ArchiveFile::find() -- other than
that it has no effect observable from outside.

I think this the simplest interface for doing what I want to do. Any
opinions?

Diff Detail

Event Timeline

ruiu updated this revision to Diff 13667.Sep 12 2014, 4:00 PM
ruiu retitled this revision from to Add preload method..
ruiu updated this object.
ruiu edited the test plan for this revision. (Show Details)
ruiu added reviewers: t.p.northover, kledzik.
ruiu added a subscriber: Unknown Object (MLST).
kledzik edited edge metadata.Sep 12 2014, 4:22 PM

I assume that preload() return immediately, and that it is expected to spin off some thread to parse an archive member? If so, we have no overall throttle on how many threads will be started (a hundred undefines could spin up 100 threads). Also, how is the archive reader to coordinate if the Resolver gets to the point it really wants an object file to fulfill and undefine but some other thread is busy parsing that member?

Don't we just have a producer/consumer problem where the archive reader is the producer and the resolver is the consumer. The consumer is single threaded and currently queries (pulls from) the producer on the consumer thread. Can the driver start up some producer task for archive reading to pre-parse archives? Rather than blindly parsing all members, your idea of passing undefined symbols names to the producer is a good idea.

a) The undefined symbol could very well be in a shared library or a regular ELF object, which is already parsed by a thread.
b) There might be more than one archive library or a .o that would have the same symbol name.
c) Weak symbol resolution has to be taken care as well, as the symbol name would be the same, and symbols may be picked up from a different archive on how the weak symbol was resolved.
d) The inputGraph is agnostic to what input in the graph contains the symbol name.

We could have a background thread, that could be designated to only parse archive files by looking at undefined symbols.

What do you think ?

include/lld/Core/InputGraph.h
85

spell errors.