This is an archive of the discontinued LLVM Phabricator instance.

COFF: Open and map input files asynchronously on Windows.
ClosedPublic

Authored by pcc on Dec 14 2016, 11:00 AM.

Details

Summary

Profiling revealed that the majority of lld's execution time on Windows was
spent opening and mapping input files. We can reduce this cost significantly
by performing these operations asynchronously.

This change introduces a queue for all operations on input file data. When
we discover that we need to load a file (for example, when we find a lazy
archive for an undefined symbol, or when we read a linker directive to
load a file from disk), the file operation is launched using a future and
the symbol resolution operation is enqueued. This implies another change
to symbol resolution semantics, but it seems to be harmless ("ninja All"
in Chromium still succeeds).

To measure the perf impact of this change I linked Chromium's chrome_child.dll
with both thin and fat archives.

Thin archives:

Before (median of 5 runs): 19.50s
After: 10.93s

Fat archives:

Before: 12.00s
After: 9.90s

On Linux I found that doing this asynchronously had a negative effect on
performance, probably because the cost of mapping a file is small enough that
it becomes outweighed by the cost of managing the futures. So on non-Windows
platforms I use the deferred execution strategy.

Diff Detail

Repository
rL LLVM

Event Timeline

pcc updated this revision to Diff 81422.Dec 14 2016, 11:00 AM
pcc retitled this revision from to COFF: Open and map input files asynchronously on Windows..
pcc updated this object.
pcc added a reviewer: ruiu.
pcc added a subscriber: llvm-commits.
ruiu edited edge metadata.Dec 14 2016, 1:23 PM

Which is actually slow, open or mmap? My wild guess is that mmap is slow. If so, can we open a file to get an error early and then mmap? It will simplify std::future<MBErrPair> to std::future<MemoryBuffer>.

lld/COFF/Driver.cpp
794 ↗(On Diff #81422)

Is this enough to run this only once? In the above code we repeat until converge.

lld/COFF/InputFiles.h
86 ↗(On Diff #81422)

Update comment.

pcc added a comment.Dec 14 2016, 4:09 PM
In D27768#622639, @ruiu wrote:

Which is actually slow, open or mmap?

The MSVC profiler took 7510 "samples" inside getOpenFileImpl (which just maps the file) as compared to 7683 inside getFileAux (which opens the file then calls getOpenFileImpl).

My wild guess is that mmap is slow. If so, can we open a file to get an error early and then mmap? It will simplify std::future<MBErrPair> to std::future<MemoryBuffer>.

That's a good point, I'll do that. Although we could fail to map the file, it's probably fine to exit on the thread since that's less likely than open failing.

lld/COFF/Driver.cpp
794 ↗(On Diff #81422)

We would only need to run addCombinedLTOObjects again if the native COFF file contained a compiler-generated reference to a bitcode symbol. This is unlikely to occur in the COFF linker because the compiler will generally only create references to the runtime library, which of course MSVC ships as regular object files. We previously tried to handle that situation by reporting an error (see SymbolTable.cpp:380 in the old code), but that code was untested. I didn't think it was worth it to preserve the error as this is basically a "don't care" situation, but I guess it wouldn't be too hard to do that and write a test.

lld/COFF/InputFiles.h
86 ↗(On Diff #81422)

Will do.

pcc updated this revision to Diff 81508.Dec 14 2016, 6:00 PM
pcc marked an inline comment as done.
pcc edited edge metadata.
  • Only map the file asynchronously
pcc added a comment.Dec 14 2016, 6:00 PM
In D27768#622969, @pcc wrote:

That's a good point, I'll do that. Although we could fail to map the file, it's probably fine to exit on the thread since that's less likely than open failing.

Done. Although the code certainly looks cleaner, now that I've thought about it more I'm not sure whether it's a good idea. Since we now leave file descriptors open for a longer period of time, we may become liable to exceeding limits on the number of open file descriptors, whereas with the previous patch a reasonable implementation of std::future would limit the number of concurrent tasks and hence the number of open file descriptors. I guess one way to avoid that could be to dequeue tasks once we reach a predefined size limit. That would make the behavior a little harder to reason about though.

That said, the new latency numbers are: fat archives 9.85s, thin archives 11.08s. Which seems reasonable to me.

ruiu added a comment.Dec 14 2016, 6:41 PM

I don't know for sure, but I guess that resources associated with a file descriptor are freed when all references to the descriptor is released. If you mmap a file descriptor and close it, that close is at the moment almost no-op (except it decrements a kernel internal counter by one), and the resources associated with it will be released when you munmap, no?

pcc added a comment.Dec 14 2016, 7:42 PM
In D27768#623208, @ruiu wrote:

I don't know for sure, but I guess that resources associated with a file descriptor are freed when all references to the descriptor is released. If you mmap a file descriptor and close it, that close is at the moment almost no-op (except it decrements a kernel internal counter by one), and the resources associated with it will be released when you munmap, no?

I don't think that's quite right on Windows. According to [0], on Windows file descriptors are distinct from, and a more limited resource as compared to, OS-level handles (the default limit on the number of file descriptors is 512, and it's possible to increase it to up to 2048; OS-level handles are effectively unlimited). Until now we have only been using file descriptors ephemerally (on Windows MemoryBuffer::getFile will create a handle, create a descriptor from that, extract the handle from the descriptor to map the file and finally close the descriptor), so this hasn't been a problem before. But now descriptors can potentially live much longer.

All of that suggests a cleaner fix: change LLVM's MemoryBuffer API so that it uses handles on Windows. That's probably way outside of the scope of this patch, though. Perhaps we can go back to the earlier version with a FIXME?

[0] http://stackoverflow.com/questions/870173/is-there-a-limit-on-number-of-open-files-in-windows

ruiu accepted this revision.Dec 14 2016, 7:44 PM
ruiu edited edge metadata.

Sure. I'm fine in either way. LGTM.

This revision is now accepted and ready to land.Dec 14 2016, 7:44 PM
This revision was automatically updated to reflect the committed changes.
pcc added a comment.Dec 14 2016, 8:13 PM

Thanks, I went with the previous approach with a FIXME.