This is an archive of the discontinued LLVM Phabricator instance.

[lld][Driver] Parallelize reading initial object files.
AbandonedPublic

Authored by Bigcheese on Apr 12 2013, 11:45 AM.

Diff Detail

Event Timeline

I think it would be nice to have an option to not use any threads, for debugging / testing purposes. It could also be an option to benchmark / experiment things with number of threads spawned inside the thread pool.

lib/Driver/Driver.cpp
47

Why you want this to be atomic bool ? because you are only setting the value to be true and the check/return is done only after all the threads have got synced ?

Also it would be essential to check if there was a failure reading one of the files before reading the next set of files.

For example:

Say thread concurrency = 2

TG created 2 threads and there were 4 files

Say there was an error reading file 1 (or) file 2. You could stop file 3/4 from being read.

62–67

Isnt using index a problem with archive files.

--force-load-archive option / multiple archive files getting pulled in ?

lib/ReaderWriter/ELF/ELFTargetInfo.cpp
95–120 ↗(On Diff #1614)

I think we could create the _elfReader/_yamlReader/_linkerScriptReader as soon as ELFTargetInfo is initialized. Because after the first time the _elfReader/_yamlReader/_linkerScriptReader is initialized, you dont need the locks. This implementation will lock them and make it slower too, because of the locks / unlocks being done multiple times.

lib/ReaderWriter/ELF/File.h
314

did you mean iterate over sections by their positions in the object file ?

315–325

Was there an issue earlier ?

In future may be, atoms from each section can be parallelized too, which will make the reader very powerful. The references vector may need to change.

What do you think ?

Bigcheese added inline comments.Apr 12 2013, 1:21 PM
lib/Driver/Driver.cpp
47

It has to be atomic because it could be written from multiple threads.

I don't see why it's essential to early exit the other tasks. It is indeed possible though. Just check the value of fail.

62–67

files is a std::vector<std::vector<std::unique_ptr<File>>>. Each input argument gets a std::vector.

lib/ReaderWriter/ELF/ELFTargetInfo.cpp
95–120 ↗(On Diff #1614)

k

lib/ReaderWriter/ELF/File.h
314

Yes, that's what that says.

315–325

I changed sectionSymbols to a DenseMap instead of a std::map. This is required because DenseMap is unordered.

In future may be, atoms from each section can be parallelized too, which will make the reader very powerful. The references vector may need to change.

What do you think ?

A reader can do this, sure. We just need to ensure the section is large enough to overcome the cost of creating a task.

shankarke added inline comments.Apr 12 2013, 1:32 PM
lib/Driver/Driver.cpp
47

But all threads would write only a value true. Atomic would be needed only if one thread is writing true and other false, and you need to sync between them right ?

Say that 3.o and 4.o were very huge files. Linker is going to anyways exit with an error, I see no point of reading 3.o / 4.o.

62–67

ah ok.

Bigcheese added inline comments.Apr 12 2013, 4:02 PM
lib/Driver/Driver.cpp
47

[intro.multithreaded] 4 Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location.

[intro.multithreaded] 21 The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.

Thus synchronization is required.

And yes, early exit is desirable.

kledzik added inline comments.Apr 12 2013, 4:04 PM
lib/Driver/Driver.cpp
53–67

Is there a reason why the high level readFile() in the original was split up into its two low level steps of getBuffer() and parseFile()? Using the low level steps makes this code more complicated.

Also, if multiple files have errors, you can get intermixed error message since you have lots of threads writing to diagnostics at once. Another way would be to have a parallel array of error_codes and each thread/lambda just sets its error code in ec[index] and returns. Then after the tg.sync(), you walk the ec array in order and if there is an error, you write it out. That way every run results in the same error messages in the same order, and you don't need the atomic "fail" anymore (unless you want it to terminate early).

Bigcheese added inline comments.Apr 12 2013, 5:50 PM
lib/Driver/Driver.cpp
53–67

It uses getBuffer and parseFile so that we don't mmap the file multiple times.

Good idea for synchronizing the errors.

kledzik added inline comments.Apr 12 2013, 6:25 PM
lib/Driver/Driver.cpp
53–67

The multiple mappings is independent of this parallelization (i.e. could be separate patch). We should overload readFile() to take a LinkerInput or a path.

Sometimes the reader needs to take ownership of the buffer and sometimes it does not. Which is why parseFile() takes the unique_ptr<MemoryBuffer> by reference. This intermediate getMemBuffer() object muddies the water.

Bigcheese updated this revision to Unknown Object (????).Apr 19 2013, 5:31 PM

Address comments and split out reader fix.

ruiu added a comment.Aug 7 2013, 4:52 PM

Resurrecting the old thread.

I'm thinking to rewrite the concurrent file parsing part using std::future and std::async. We can get performance benefit from doing that, as the core linker will not have to wait all the input files to be parsed (but instead will be blocked when it tries to use a future which is not ready). It will also simplifies the code because it's the standard library.

What I'm wondering is why this code did not use std::future and std::async from the beginning. Was there any reason to avoid that?

ruiu added a comment.Aug 8 2013, 10:23 PM

I'll start writing a patch to use std::future to represent a file object which may or may not be being parsed. If there's an issue, please yell at me.

Bigcheese abandoned this revision.Nov 14 2019, 1:07 PM