This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/MachO/
-
MachO/
26/26
Writer.cpp

Differential D115416

[lld-macho] Make writing map file asynchronous
ClosedPublic

Authored by thevinster on Dec 8 2021, 6:26 PM.

Download Raw Diff

Details

Reviewers

oontvoo
int3

Group Reviewers

Restricted Project

Commits

rGd17b092fe690: [lld-macho] Make writing map file asynchronous

Summary

For large applications that write to map files, writing map files can take quite
a bit of time. Sorting the biggest contributors to link times, writing map files
ranks in at 2nd place, with load input files being the biggest contributor of
link times. Avoiding writing map files on the critical path (and having its own
thread) saves ~2-3 seconds when linking chromium framework on a 16-Core
Intel Xeon W.

           base            diff            difference (95% CI)
sys_time   1.617 ± 0.034   1.657 ± 0.026   [  +1.5% ..   +3.5%]
user_time  28.536 ± 0.245  28.609 ± 0.180  [  -0.1% ..   +0.7%]
wall_time  23.833 ± 0.271  21.684 ± 0.194  [  -9.5% ..   -8.5%]
samples    31              24

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

thevinster created this revision.Dec 8 2021, 6:26 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2021, 6:26 PM

Herald added a reviewer: Restricted Project. · View Herald Transcript

thevinster edited the summary of this revision. (Show Details)Dec 8 2021, 6:31 PM

thevinster published this revision for review.Dec 8 2021, 6:33 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2021, 6:33 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

oontvoo added a subscriber: oontvoo.Dec 8 2021, 6:39 PM

oontvoo added inline comments.

lld/MachO/Writer.cpp
1163–1164

This revision is now accepted and ready to land.Dec 8 2021, 6:39 PM

did you forget to add the implementation of writeMapAsync?

lld/MachO/Writer.cpp
1163–1164	yeah, I don't think the map file contains anything from LinkEdit, but just to be safe I would move it after everything's finalized. I'm assuming `writeOutputFile` takes longer than `writeMapAsync` anyway

This revision now requires changes to proceed.Dec 8 2021, 8:16 PM

In D115416#3181832, @int3 wrote:

did you forget to add the implementation of writeMapAsync?

Isn't line 1151, writeMapAsync a variable? :) (not a func call)

lld/MachO/Writer.cpp
1163–1164	PS: actually, dont you want `std::future<void> writeMapAsync (std::async(std::launch::async, []{ writeMapFile(); }));` ?

Isn't line 1151, writeMapAsync a variable? :) (not a func call)

d'oh, yes, brain fart there...

lld/MachO/Writer.cpp
1163–1164	yup, based on https://en.cppreference.com/w/cpp/thread/async I think @oontvoo 's right. Also, that page says If the std::future obtained from std::async is not moved from or bound to a reference, the destructor of the std::future will block at the end of the full expression until the asynchronous operation completes so I don't think the `get()` call below is necessary?

I'm assuming writeOutputFile takes longer than writeMapAsync anyway

Profiling on my end shows that writeMapFile actually takes longer than writeOutputFile and on some cases longer than finalizeLinkEditSegment. I personally believe we get more wins from if we did writeMapFile before finalizeLinkEditSegment. But, if we plan on having writeMapFile depend on contents in finalizeLinkEditSegment in the future, I'm happy to move it after finalizeLinkEditSegment.

dont you want std::future<void> writeMapAsync (std::async(std::launch::async, []{ writeMapFile(); })); ?

By not providing the policy, it lets the system decide whether it should be async or deferred. The async policy is preferred, but the system will choose deferred if the system is running out of memory and can't spawn another thread in efforts to not hard crash. I think we can agree that it's better not to crash if it means having to not run it in parallel? Also, the lambda isn't necessary at all. It's working without the lambda on my end, reference docs don't specify that it must be wrapped in a lambda.

so I don't think the get() call below is necessary?

In the majority of cases, it's probably not necessary, but I think it's better to be safer than sorry. If the implementation changes behind the scenes for us and starts producing undefined behaviors, then it might not be obvious where crashes originate from right?

Profiling on my end shows that writeMapFile actually takes longer than writeOutputFile and on some cases longer than finalizeLinkEditSegment.

Oh interesting... well we can stick with this order then

The async policy is preferred, but the system will choose deferred if the system is running out of memory and can't spawn another thread in efforts to not hard crash.

Do you have a source for this? Because that's not what cppreference says... it just says

If both the std::launch::async and std::launch::deferred flags are set in policy, it is up to the implementation whether to perform asynchronous execution or lazy evaluation.

I agree the lambda isn't necessary.

If the implementation changes behind the scenes for us

cppreference describes what the standard indicates implementations should do, so this shouldn't be a concern

Harbormaster completed remote builds in B138342: Diff 393007.Dec 8 2021, 9:50 PM

In D115416#3181890, @int3 wrote:

Profiling on my end shows that writeMapFile actually takes longer than writeOutputFile and on some cases longer than finalizeLinkEditSegment.

Oh interesting... well we can stick with this order then

The async policy is preferred, but the system will choose deferred if the system is running out of memory and can't spawn another thread in efforts to not hard crash.

Do you have a source for this? Because that's not what cppreference says... it just says

If both the std::launch::async and std::launch::deferred flags are set in policy, it is up to the implementation whether to perform asynchronous execution or lazy evaluation.

I agree the lambda isn't necessary.

If the implementation changes behind the scenes for us

cppreference describes what the standard indicates implementations should do, so this shouldn't be a concern

Ok so looks like there are two options:

if we don't specify the policy explicitly, then we need the .get() call to ensure the future actually happens.
if we do specify the policy to async, then no need to call .get();

(What's worse is that all of this is implementation defined, which is to say, no guaranteed it wouldn't change).

I guess there are no great options here but the current one is fine (ie., first option). Can we at least edit the comment to say it more clearly that since the policy is not specified, it may be lazy or async therefore we need to call .get() to get it eval'ed?

(Same question with int3, though - where does it say the impl will pick the best policy based on thread/memory?)

P.S: Another suggestion (since std::future in generally isn't great and I've heard WG21 wasn't using it in any future design), maybe just use LLVM's parallel API?
(ie., use the parallel ...() to run writeMapFile() in parallel with { finalizeLinkEditSegment(); writeOutputFile(); } )
This could be better than mixing multiple threading APIs

Do you have a source for this?

Over at https://www.cplusplus.com/reference/future/async/ under automatic selection, it says Automatic: The function chooses the policy automatically (at some point). This depends on the system and library implementation, which generally optimizes for the current availability of concurrency in the system. I think this means that the system will be smart enough to know which one it should use based on its available resources.

Also in the cppreference you have linked, there's a section on Exceptions that says Throws std::system_error with error condition std::errc::resource_unavailable_try_again if the launch policy equals std::launch::async and the implementation is unable to start a new thread (if the policy is async|deferred or has additional bits set, it will fall back to deferred or the implementation-defined policies in this case).

use the parallel ...() to run writeMapFile() in parallel with { finalizeLinkEditSegment(); writeOutputFile(); }

This was my initial thought. I abandoned this approach because 1/ I didn't quite know what the right syntax is to get this particular thing working and 2/ we aren't really trying to run writeMapFile in parallel with *only* finalizeLinkEditSegment or writeOutputFile. It's more about putting things off the critical path. Let's say finalizeLinkEditSegment is its own thread (for whatever reason) then things gets tricky since we would have a main thread, writeMapFile thread, and the finalizeLinkEditSegment thread. With async it's easy, but how does that look like with parallel?

In D115416#3183365, @thevinster wrote:

This was my initial thought. I abandoned this approach because 1/ I didn't quite know what the right syntax is to get this particular thing working and 2/ we aren't really trying to run writeMapFile in parallel with *only* finalizeLinkEditSegment or writeOutputFile. It's more about putting things off the critical path. Let's say finalizeLinkEditSegment is its own thread (for whatever reason) then things gets tricky since we would have a main thread, writeMapFile thread, and the finalizeLinkEditSegment thread. With async it's easy, but how does that look like with parallel?

(see suggested edit inline)

I'm not sure understand how running writeMapFile() in its on thread (parallel with the other two functions) any different here ... Either way, the "main" thread still have to block until all three are finished before it can return from this.

So the premise here is that writeMap is slow, hence we want to do something else in parallel with it rather than waiting for it , yes?

lld/MachO/Writer.cpp
1162–1166	perhaps something like this ^ ?

I'll admit your suggestion is clever, but I think it sacrifices on readability. How would something like this be modeled in the parallelForEach statement?

std::async(func1);
func2();
std::async(func3);
func4();

With the parallelForEach, it sounds like we'll need to nest things because parallelForEach isn't meant to model concurrency but more about parallelism.

Ah, thanks for the sources @thevinster! I was actually gonna agree that async | deferred seemed like the best choice, but then I found this: https://eli.thegreenplace.net/2016/the-promises-and-challenges-of-stdasync-task-based-parallelism-in-c11/

It makes the argument that the default policy should "never" be used because of all the potential pitfalls. I don't think any of those pitfalls apply to our relatively simple use case, but maybe it's nice to just use the async policy regardless so we don't have to think about that? But in any case...

I agree with @oontvoo that using raw std::async is not ideal since it's semi-deprecated. I'll also point out (belatedly) that it's not great to be running an async job in parallel with a call to parallelForEachN. The latter spawns jobs based on the number of cores available because it assumes that it is the only thing spawning jobs. Ideally, we would be spawning jobs from a single thread pool instead.

Parallel.h is designed for simple fork-join patterns where the forking and joining all happen at a single point, but that's not really what we're doing here. ThreadPool.h seems to be a better fit for what we are trying to do; it exposes a very similar API to std::async but I think without its pitfalls. If I'm reading the implementation correctly, Pool.async() always launches the jobs immediately if there is a thread available, and always launches them in a separate thread even if they get deferred. And it's being used throughout LLVM's codebase, so we should probably use it too :)

Here's an example we could draw on: https://github.com/llvm/llvm-project/blob/be0a7e9f27083ada6072fcc0711ffa5630daa5ec/mlir/include/mlir/IR/Threading.h#L76. I think we'd create one ThreadPool, use it to dispatch writeMapFile, then pass the same instance to finalizeLinkEditSegment, which dispatches more jobs with it (but only waits on those jobs it dispatched, and not the writeMapFile job). Also the ThreadPool dtor is blocking, so we don't actually need to wait on the writeMapFile job.

thevinster planned changes to this revision.Dec 10 2021, 11:17 AM

Use threadpool

thevinster edited the summary of this revision. (Show Details)Dec 10 2021, 6:31 PM

Harbormaster completed remote builds in B138776: Diff 393648.Dec 10 2021, 6:45 PM

Looks cool, but have you tried optimizing the map writing code a bit first? IIRC it's fairly unoptimized.

(But moving it on a thread and optimizing it are both good things to do of course.)

Looks cool, but have you tried optimizing the map writing code a bit first? IIRC it's fairly unoptimized.

Thanks for taking a glance! That is one area I'm thinking of improving as well, but right now this seemed like a lower hanging fruit that should give us relatively significant wins.

@thakis I think they're orthogonal and equally useful improvements. One reason we haven't optimized the actual MapFile generation yet is that we're keeping it as ramp-up work for a new team member :)

lld/MachO/Writer.cpp
68	super nit, but "writer" here seems redundant since this is the only threadpool used in Writer
1164–1165	so as my earlier block comment was saying, we should pass `writerThreadPool` to `finalizeLinkEditSegment` too and use that in lieu of `parallelForEachN`, so that we maintain control of the total number of threads.

thevinster added inline comments.Dec 13 2021, 11:04 AM

lld/MachO/Writer.cpp
68	I think it might be more relevant now we that want to use it in `finalizeLinkEditSegment`. Any reason why we want to instantiate it in the `run` scope as opposed to having it as a member of the class? It's possible we may want to use it elsewhere within the `Writer` class? In fact, scanning usages of `parallelForEach{N}` also shows `writeUuid` benefiting from using async as well. In that case, it makes sense to put it at the class level scope.
1164–1165	Oh I see. It wasn't clear from your previous comment that you wanted to replace `parallelForEachN` with it. I'll change.

int3 added inline comments.Dec 13 2021, 11:13 AM

lld/MachO/Writer.cpp
68	I have no issues with the scope. I'm just saying that having "writer" in the variable name seems redundant

thevinster planned changes to this revision.Dec 13 2021, 11:15 AM

thevinster added inline comments.

lld/MachO/Writer.cpp
68	My bad. I misinterpreted the statement. 👍

Show two different ways of using threadPool

Not meant as a commit for submitting, but wanted to poll the audience on what pattern we'd like to use with threadPool. We essentially have three options here:

Option 1 - continue using parallelForEach methods
Option 2 - Re-use our declared class level threadPool, but areas where we only want to wait on a subset of threads, we have to re-implement the fork-join model by declaring some vector and waiting on those threads that are pushed back to the vector (See L1040-1050)
Option 3 - Declare class-level and method-level threadPools. Simpifies the for loop from Option 2, but might be harder to manage since we now have to control N number of threadPools with a potential of N different concurrency settings. (See L1103-1115)

I did run a benchmark across all options and there's no reasonable performance improvement with either options. So, it seems like this all comes down to syntax and what looks best.

thevinster planned changes to this revision.Dec 13 2021, 6:27 PM

oontvoo added inline comments.Dec 13 2021, 6:28 PM

lld/MachO/Writer.cpp
1044–1047
1045	nit: no brace
1100–1111	why is this change needed?

Harbormaster completed remote builds in B139116: Diff 394103.Dec 13 2021, 6:35 PM

lld/MachO/Writer.cpp
1100–1111	This commit is meant as a way to show the different ways we could be using `ThreadPool`. It's more of an RFC (See https://reviews.llvm.org/D115416#3190992). As far as why we want to change it (per @int3's suggestion), we want to change all our uses of `parallelForEach` in favor of `ThreadPool` so we can control the total number of threads.

int3 added inline comments.Dec 13 2021, 9:18 PM

lld/MachO/Writer.cpp
68	so can we rename `writerThreadPool` to just `threadPool`? :)
1100–1111	we should be reusing the `writerThreadPool` here instead of creating a `writeUuidThreadPool`... each ThreadPool maintains its own counter of the number of threads spawned

thevinster added inline comments.Dec 13 2021, 9:23 PM

lld/MachO/Writer.cpp
68	I'll clean this up. This revision is more of a "which option do we want to go for first" before I clean up all the nit details
1100–1111	Okay, so it sounds like we want to go for Option 2 then?

Oh yeah sorry I forgot to reply to your other comment. Yes, I'd prefer option 2. We should use parallelForEach whenever possible, and fall back to the single class-level threadPool whenever we want to fork/join at multiple different points.

oontvoo added inline comments.Dec 14 2021, 6:24 AM

lld/MachO/Writer.cpp
1100–1111	we should be reusing the `writerThreadPool` here instead of creating a `writeUuidThreadPool`... each ThreadPool maintains its own counter of the number of threads spawned Yeah +1 to this. Maybe my question was too terse - I was asking why you needed to switch paranllelForEachN for a more manual approach but NOT used the `writerThreadPool`.

In D115416#3189669, @int3 wrote:

@thakis I think they're orthogonal and equally useful improvements.

Absolutely agreed :)

thevinster marked 14 inline comments as done.Dec 14 2021, 3:15 PM

Address feedback + use option 2

Harbormaster completed remote builds in B139311: Diff 394394.Dec 14 2021, 3:30 PM

oontvoo added inline comments.Dec 14 2021, 4:53 PM

lld/MachO/Writer.cpp
1047–1052	No need to enqueue a NULL `osec`

oontvoo added inline comments.Dec 14 2021, 4:55 PM

lld/MachO/Writer.cpp
1047–1052	(although I'm not sure it could ever be null here )

lgtm, thanks!

lld/MachO/Writer.cpp
1047–1052	+1

This revision is now accepted and ready to land.Dec 14 2021, 6:26 PM

LG - thanks for putting up with the nits!

Prevent writing map files on the critical path

Maybe a better name is something like "Make writing map file asynchronous"

thevinster retitled this revision from [lld-macho] Prevent writing map files on the critical path to [lld-macho] Make writing map file asynchronous.Dec 15 2021, 4:35 PM

thevinster marked 3 inline comments as done.

Don't enqueue if null osec

This revision was landed with ongoing or failed builds.Dec 15 2021, 4:41 PM

Closed by commit rGd17b092fe690: [lld-macho] Make writing map file asynchronous (authored by thevinster). · Explain Why

This revision was automatically updated to reflect the committed changes.

thevinster added a commit: rGd17b092fe690: [lld-macho] Make writing map file asynchronous.

Harbormaster completed remote builds in B139531: Diff 394681.Dec 15 2021, 5:07 PM

crownyanguan added a subscriber: crownyanguan.Jan 9 2022, 11:36 PM

crownyanguan added inline comments.

lld/MachO/Writer.cpp
1111	This change will lead build failure in windows, like this: W:\ll_1074497988\20211217_000000\llvm\lld\MachO\Writer.cpp(1111): error C2365: 'i': redefinition; previous definition was 'data variable' W:\ll_1074497988\20211217_000000\llvm\lld\MachO\Writer.cpp(1109): note: see declaration of 'i' c:/Program files (x86)/Microsoft Visual Studio/2017/Professional/VC/Tools/MSVC/14.16.27023/include\functional(1914): fatal error C1060: compiler is out of heap space

thevinster mentioned this in rG7a161eb43b28: [lld-macho] Fix shadowed variable.Jan 10 2022, 12:25 AM

lld/MachO/Writer.cpp
1111	Thanks for flagging! I don't have a windows machine, but https://reviews.llvm.org/rG7a161eb43b280a6194fef5b65a81ef59921ebf02 should solve the shadowing error. I'm not sure why this would be causing heap issues but could you try re-running your build again and see if it still reproduces?

int3 mentioned this in D117069: [lld-macho] Initialize separate time trace profiler for mapfile worker.Jan 11 2022, 5:10 PM

int3 mentioned this in rGe976c457c5a3: [lld-macho] Initialize separate time trace profiler for mapfile worker.Jan 11 2022, 5:45 PM

crownyanguan added inline comments.Jan 16 2022, 6:15 PM

lld/MachO/Writer.cpp
1111	I dont't whether if this fixes, but, currently, it built pass in windows.

thakis mentioned this in D137368: [lld-macho] Overhaul map file code.Nov 8 2022, 1:42 PM

Revision Contents

Path

Size

lld/

MachO/

Writer.cpp

25 lines

Diff 394685

lld/MachO/Writer.cpp

Show All 23 Lines

#include "lld/Common/ErrorHandler.h" #include "lld/Common/ErrorHandler.h"

#include "lld/Common/Memory.h" #include "lld/Common/Memory.h"

#include "llvm/BinaryFormat/MachO.h" #include "llvm/BinaryFormat/MachO.h"

#include "llvm/Config/llvm-config.h" #include "llvm/Config/llvm-config.h"

#include "llvm/Support/LEB128.h" #include "llvm/Support/LEB128.h"

#include "llvm/Support/MathExtras.h" #include "llvm/Support/MathExtras.h"

#include "llvm/Support/Parallel.h" #include "llvm/Support/Parallel.h"

#include "llvm/Support/Path.h" #include "llvm/Support/Path.h"

#include "llvm/Support/ThreadPool.h"

#include "llvm/Support/TimeProfiler.h" #include "llvm/Support/TimeProfiler.h"

#include "llvm/Support/xxhash.h" #include "llvm/Support/xxhash.h"

#include <algorithm> #include <algorithm>

using namespace llvm; using namespace llvm;

using namespace llvm::MachO; using namespace llvm::MachO;

using namespace llvm::sys; using namespace llvm::sys;

Show All 19 Lines public:

void openFile(); void openFile();

void writeSections(); void writeSections();

void writeUuid(); void writeUuid();

void writeCodeSignature(); void writeCodeSignature();

void writeOutputFile(); void writeOutputFile();

template <class LP> void run(); template <class LP> void run();

ThreadPool threadPool;

int3Unsubmitted

Done

super nit, but "writer" here seems redundant since this is the only threadpool used in Writer

int3: super nit, but "writer" here seems redundant since this is the only threadpool used in Writer

thevinsterAuthorUnsubmitted

Done

I think it might be more relevant now we that want to use it in finalizeLinkEditSegment. Any reason why we want to instantiate it in the run scope as opposed to having it as a member of the class? It's possible we may want to use it elsewhere within the Writer class? In fact, scanning usages of parallelForEach{N} also shows writeUuid benefiting from using async as well. In that case, it makes sense to put it at the class level scope.

thevinster: I think it might be more relevant now we that want to use it in `finalizeLinkEditSegment`. Any…

int3Unsubmitted

Done

I have no issues with the scope. I'm just saying that having "writer" in the variable name seems redundant

int3: I have no issues with the scope. I'm just saying that having "writer" in the variable name…

thevinsterAuthorUnsubmitted

Done

My bad. I misinterpreted the statement. 👍

thevinster: My bad. I misinterpreted the statement. 👍

int3Unsubmitted

Done

so can we rename writerThreadPool to just threadPool? :)

int3: so can we rename `writerThreadPool` to just `threadPool`? :)

thevinsterAuthorUnsubmitted

Done

I'll clean this up. This revision is more of a "which option do we want to go for first" before I clean up all the nit details

thevinster: I'll clean this up. This revision is more of a "which option do we want to go for first" before…

std::unique_ptr<FileOutputBuffer> &buffer; std::unique_ptr<FileOutputBuffer> &buffer;

uint64_t addr = 0; uint64_t addr = 0;

uint64_t fileOff = 0; uint64_t fileOff = 0;

MachHeaderSection *header = nullptr; MachHeaderSection *header = nullptr;

StringTableSection *stringTableSection = nullptr; StringTableSection *stringTableSection = nullptr;

SymtabSection *symtabSection = nullptr; SymtabSection *symtabSection = nullptr;

IndirectSymtabSection *indirectSymtabSection = nullptr; IndirectSymtabSection *indirectSymtabSection = nullptr;

CodeSignatureSection *codeSignatureSection = nullptr; CodeSignatureSection *codeSignatureSection = nullptr;

▲ Show 20 Lines • Show All 955 Lines • ▼ Show 20 Lines std::vector<LinkEditSection *> linkEditSections{

in.weakBinding, in.weakBinding,

in.lazyBinding, in.lazyBinding,

in.exports, in.exports,

symtabSection, symtabSection,

indirectSymtabSection, indirectSymtabSection,

dataInCodeSection, dataInCodeSection,

functionStartsSection, functionStartsSection,

}; };

parallelForEach(linkEditSections, [](LinkEditSection *osec) { SmallVector<std::shared_future<void>> threadFutures;

threadFutures.reserve(linkEditSections.size());

for (LinkEditSection *osec : linkEditSections)

if (osec) if (osec)

osec->finalizeContents(); threadFutures.emplace_back(threadPool.async(

}); [](LinkEditSection *osec) { osec->finalizeContents(); }, osec));

oontvooUnsubmitted

Done

nit: no brace

oontvoo: nit: no brace

for (std::shared_future<void> &future : threadFutures)

future.wait();

oontvooUnsubmitted

Done

functionStartsSection,

};

SmallVector<std::shared_future<void>> threadFutures;

+ threadFutures.reserve(linkEditSections.size());

for (LinkEditSection *osec : linkEditSections) {

oontvoo:

// Now that __LINKEDIT is filled out, do a proper calculation of its // Now that __LINKEDIT is filled out, do a proper calculation of its

// addresses and offsets. // addresses and offsets.

linkEditSegment->addr = addr; linkEditSegment->addr = addr;

assignAddresses(linkEditSegment); assignAddresses(linkEditSegment);

oontvooUnsubmitted

Done

for (LinkEditSection *osec : linkEditSections)

- threadFutures.emplace_back(threadPool.async(

- [](LinkEditSection *osec) {

- if (osec)

+ if (osec)

+ threadFutures.emplace_back(threadPool.async(

+ [](LinkEditSection *osec) {

osec->finalizeContents();

- },

- osec));

+ }, osec));

for (std::shared_future<void> &future : threadFutures)

No need to enqueue a NULL osec

oontvoo: No need to enqueue a NULL `osec`

oontvooUnsubmitted

Done

(although I'm not sure it could ever be null here )

oontvoo: (although I'm not sure it could ever be null here )

int3Unsubmitted

Done

int3: +1

// No need to page-align fileOff / addr here since this is the last segment. // No need to page-align fileOff / addr here since this is the last segment.

linkEditSegment->vmSize = addr - linkEditSegment->addr; linkEditSegment->vmSize = addr - linkEditSegment->addr;

linkEditSegment->fileSize = fileOff - linkEditSegment->fileOff; linkEditSegment->fileSize = fileOff - linkEditSegment->fileOff;

} }

void Writer::assignAddresses(OutputSegment *seg) { void Writer::assignAddresses(OutputSegment *seg) {

seg->fileOff = fileOff; seg->fileOff = fileOff;

Show All 31 Lines for (const OutputSection *osec : seg->getSections())

osec->writeTo(buf + osec->fileOff); osec->writeTo(buf + osec->fileOff);

} }

// In order to utilize multiple cores, we first split the buffer into chunks, // In order to utilize multiple cores, we first split the buffer into chunks,

// compute a hash for each chunk, and then compute a hash value of the hash // compute a hash for each chunk, and then compute a hash value of the hash

// values. // values.

void Writer::writeUuid() { void Writer::writeUuid() {

TimeTraceScope timeScope("Computing UUID"); TimeTraceScope timeScope("Computing UUID");

ArrayRef<uint8_t> data{buffer->getBufferStart(), buffer->getBufferEnd()}; ArrayRef<uint8_t> data{buffer->getBufferStart(), buffer->getBufferEnd()};

unsigned chunkCount = parallel::strategy.compute_thread_count() * 10; unsigned chunkCount = parallel::strategy.compute_thread_count() * 10;

// Round-up integer division // Round-up integer division

size_t chunkSize = (data.size() + chunkCount - 1) / chunkCount; size_t chunkSize = (data.size() + chunkCount - 1) / chunkCount;

std::vector<ArrayRef<uint8_t>> chunks = split(data, chunkSize); std::vector<ArrayRef<uint8_t>> chunks = split(data, chunkSize);

std::vector<uint64_t> hashes(chunks.size()); std::vector<uint64_t> hashes(chunks.size());

parallelForEachN(0, chunks.size(), SmallVector<std::shared_future<void>> threadFutures;

[&](size_t i) { hashes[i] = xxHash64(chunks[i]); }); threadFutures.reserve(chunks.size());

for (size_t i = 0; i < chunks.size(); ++i)

threadFutures.emplace_back(threadPool.async(

[&](size_t i) { hashes[i] = xxHash64(chunks[i]); }, i));

oontvooUnsubmitted

Done

why is this change needed?

oontvoo: why is this change needed?

thevinsterAuthorUnsubmitted

Done

This commit is meant as a way to show the different ways we could be using ThreadPool. It's more of an RFC (See https://reviews.llvm.org/D115416#3190992). As far as why we want to change it (per @int3's suggestion), we want to change all our uses of parallelForEach in favor of ThreadPool so we can control the total number of threads.

thevinster: This commit is meant as a way to show the different ways we could be using `ThreadPool`. It's…

int3Unsubmitted

Done

we should be reusing the writerThreadPool here instead of creating a writeUuidThreadPool... each ThreadPool maintains its own counter of the number of threads spawned

int3: we should be reusing the `writerThreadPool` here instead of creating a `writeUuidThreadPool`...

thevinsterAuthorUnsubmitted

Done

Okay, so it sounds like we want to go for Option 2 then?

thevinster: Okay, so it sounds like we want to go for Option 2 then?

oontvooUnsubmitted

Done

we should be reusing the writerThreadPool here instead of creating a writeUuidThreadPool... each ThreadPool maintains its own counter of the number of threads spawned

Yeah +1 to this.
Maybe my question was too terse - I was asking why you needed to switch paranllelForEachN for a more manual approach but NOT used the writerThreadPool.

oontvoo: > we should be reusing the `writerThreadPool` here instead of creating a `writeUuidThreadPool`..

crownyanguanUnsubmitted

Done

This change will lead build failure in windows, like this:
W:\ll_1074497988\20211217_000000\llvm\lld\MachO\Writer.cpp(1111): error C2365: 'i': redefinition; previous definition was 'data variable'
W:\ll_1074497988\20211217_000000\llvm\lld\MachO\Writer.cpp(1109): note: see declaration of 'i'
c:/Program files (x86)/Microsoft Visual Studio/2017/Professional/VC/Tools/MSVC/14.16.27023/include\functional(1914): fatal error C1060: compiler is out of heap space

crownyanguan: This change will lead build failure in windows, like this: W…

thevinsterAuthorUnsubmitted

Done

Thanks for flagging! I don't have a windows machine, but https://reviews.llvm.org/rG7a161eb43b280a6194fef5b65a81ef59921ebf02 should solve the shadowing error. I'm not sure why this would be causing heap issues but could you try re-running your build again and see if it still reproduces?

thevinster: Thanks for flagging! I don't have a windows machine, but https://reviews.llvm.

crownyanguanUnsubmitted

Done

I dont't whether if this fixes, but, currently, it built pass in windows.

crownyanguan: I dont't whether if this fixes, but, currently, it built pass in windows.

for (std::shared_future<void> &future : threadFutures)

future.wait();

uint64_t digest = xxHash64({reinterpret_cast<uint8_t *>(hashes.data()), uint64_t digest = xxHash64({reinterpret_cast<uint8_t *>(hashes.data()),

hashes.size() * sizeof(uint64_t)}); hashes.size() * sizeof(uint64_t)});

uuidCommand->writeUuid(digest); uuidCommand->writeUuid(digest);

} }

void Writer::writeCodeSignature() { void Writer::writeCodeSignature() {

if (codeSignatureSection) if (codeSignatureSection)

codeSignatureSection->writeHashes(buffer->getBufferStart()); codeSignatureSection->writeHashes(buffer->getBufferStart());

Show All 31 Lines template <class LP> void Writer::run() {

// After this point, we create no new segments; HOWEVER, we might // After this point, we create no new segments; HOWEVER, we might

// yet create branch-range extension thunks for architectures whose // yet create branch-range extension thunks for architectures whose

// hardware call instructions have limited range, e.g., ARM(64). // hardware call instructions have limited range, e.g., ARM(64).

// The thunks are created as InputSections interspersed among // The thunks are created as InputSections interspersed among

// the ordinary __TEXT,_text InputSections. // the ordinary __TEXT,_text InputSections.

sortSegmentsAndSections(); sortSegmentsAndSections();

createLoadCommands<LP>(); createLoadCommands<LP>();

finalizeAddresses(); finalizeAddresses();

threadPool.async(writeMapFile);

finalizeLinkEditSegment(); finalizeLinkEditSegment();

oontvooUnsubmitted

Done

finalizeAddresses();

- std::future<void> writeMapAsync(std::async(writeMapFile));

finalizeLinkEditSegment();

+ std::future<void> writeMapAsync(std::async(writeMapFile));

writeOutputFile();

oontvoo:

int3Unsubmitted

Done

yeah, I don't think the map file contains anything from LinkEdit, but just to be safe I would move it after everything's finalized. I'm assuming writeOutputFile takes longer than writeMapAsync anyway

int3: yeah, I don't think the map file contains anything from LinkEdit, but just to be safe I would…

oontvooUnsubmitted

Done

PS: actually, dont you want std::future<void> writeMapAsync (std::async(std::launch::async, []{ writeMapFile(); })); ?

oontvoo: PS: actually, dont you want `std::future<void> writeMapAsync (std::async(std::launch::async…

int3Unsubmitted

Done

yup, based on https://en.cppreference.com/w/cpp/thread/async I think @oontvoo 's right.

Also, that page says

If the std::future obtained from std::async is not moved from or bound to a reference, the destructor of the std::future will block at the end of the full expression until the asynchronous operation completes

so I don't think the get() call below is necessary?

int3: yup, based on https://en.cppreference.com/w/cpp/thread/async I think @oontvoo 's right. Also…

writeMapFile();

writeOutputFile(); writeOutputFile();

int3Unsubmitted

Done

so as my earlier block comment was saying, we should pass writerThreadPool to finalizeLinkEditSegment too and use that in lieu of parallelForEachN, so that we maintain control of the total number of threads.

int3: so as my earlier block comment was saying, we should pass `writerThreadPool` to…

thevinsterAuthorUnsubmitted

Done

Oh I see. It wasn't clear from your previous comment that you wanted to replace parallelForEachN with it. I'll change.

thevinster: Oh I see. It wasn't clear from your previous comment that you wanted to replace…

} }

oontvooUnsubmitted

Done

createLoadCommands<LP>();

finalizeAddresses();

- std::future<void> writeMapAsync(std::async(writeMapFile));

- finalizeLinkEditSegment();

- writeOutputFile();

- // Ensure writeMapFile completes before the main thread returns

- // to prevent undefined behavior.

- writeMapAsync.get();

- }

+ // Run writeMapFile in a separate thread because it's slower

+ parallelForEachN(0, 1, [&](size_t id) {

+ if (id == 0) {

+ finalizeLinkEditSegment();

+ writeOutputFile();

+ } else {

+ writeMapFile();

+ }

+ });}

template <class LP> void macho::writeResult() { Writer().run<LP>(); }

perhaps something like this ^ ?

oontvoo: perhaps something like this ^ ?

template <class LP> void macho::writeResult() { Writer().run<LP>(); } template <class LP> void macho::writeResult() { Writer().run<LP>(); }

void macho::resetWriter() { LCDylib::resetInstanceCount(); } void macho::resetWriter() { LCDylib::resetInstanceCount(); }

void macho::createSyntheticSections() { void macho::createSyntheticSections() {

in.header = make<MachHeaderSection>(); in.header = make<MachHeaderSection>();

if (config->dedupLiterals) { if (config->dedupLiterals) {

Show All 35 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Make writing map file asynchronousClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 394685

lld/MachO/Writer.cpp

[lld-macho] Make writing map file asynchronous
ClosedPublic