This is an archive of the discontinued LLVM Phabricator instance.

WIP: llvm-buildozer
Needs ReviewPublic

Authored by aganea on Aug 21 2020, 8:08 AM.

Details

Summary

This is a very early prototype for llvm-buildozer, a tool I've presented last year at the LLVM conference: https://www.youtube.com/watch?v=usPL_DROn4k

The overall goal is to switch the source code build model from a pool of processes, to a pool of threads.

Previously, a tool like ninja would both invoke the compiler and the linker .EXEs, based on a list of build actions, which are in turn built from a .ninja script containing a list of TUs and targets to compile. This model has the adverse effect of starting a new EXEs for each TU, having to call the initialization and compilation environnement detection, loading or mmap'ing of source files, etc. for each EXE instance. This comes with a high execution cost on Windows.

Our llvm-buildozer tool insteads takes a modified .CDB as an input, loads all necessary EXEs as DLLs once, and calls the build action internally, in a pool of threads, without dropping the EXEs, re-initializing the compiler or the heap. This evidently requires the compiler & link EXEs to be thread-safe, which represents the vast majority of our changes in this patch. With our llvm-buildozer tool, the rebuild times are decreased by 60%, tested on a 6-core, on a real-world game project: 20 min before -> 11 min 50 sec after.

As future work, if this idea passes RFC/community consensus, the goal after is to start working on inter-thread communication, such as sharing the state of the SourceManager or the tokenizer for a starter. Also a in-memory program repo, along the lines of SN-System's ideas could further avoid generating duplicate section/optimizations accross TUs.

This is an old prototype untouched since the LLVM conference, pasting this here only for reference purposes. I'm planning to get back to work on it at some point - or feel free in the meanwhile to pick anything interesting from this patch.

Diff Detail

Event Timeline

aganea created this revision.Aug 21 2020, 8:08 AM
aganea requested review of this revision.Aug 21 2020, 8:08 AM
aganea retitled this revision from WIP: llvm-buildozer 1/3 to WIP: llvm-buildozer 1/3 [llvm part].

When you extended the thread pool for high cores counts for Windows, I thought the other way around. I want to you use the distributed thinlto locally. Then each ninja job is single threaded and ninja can schedule everything. This probably won't be a desirable solution for Windows.

If you can get benefits out of inter-thread communication, then I am getting interested again.

When you extended the thread pool for high cores counts for Windows, I thought the other way around. I want to you use the distributed thinlto locally. Then each ninja job is single threaded and ninja can schedule everything. This probably won't be a desirable solution for Windows.

This has been suggested by Bruce Dawson here: https://github.com/ninja-build/ninja/issues/1638 - although I haven't seen the prototype he mentions nor any figures. I feel that the direct and indirect costs for context switching between a pool of processes would be higher vs. a pool of threads in a single ongoing process, and that would increase with the number of cores. But I remain open to any idea, as long as it improves build times on Windows.

aganea updated this revision to Diff 351663.Jun 12 2021, 8:56 AM
aganea retitled this revision from WIP: llvm-buildozer 1/3 [llvm part] to WIP: llvm-buildozer.

Rebase.

Herald added a reviewer: sstefan1. · View Herald Transcript
Herald added a reviewer: MaskRay. · View Herald Transcript
Herald added a reviewer: baziotis. · View Herald Transcript
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
ychen added a subscriber: ychen.Jan 28 2022, 1:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2022, 12:57 PM
arsenm added inline comments.Apr 19 2022, 7:05 PM
llvm/lib/Analysis/MemorySSA.cpp
89 ↗(On Diff #351663)

Do command line flags like this really need to be thread local? Wouldn't ever cl::opt need to be if this is?

aganea added inline comments.Apr 22 2022, 6:41 AM
llvm/lib/Analysis/MemorySSA.cpp
89 ↗(On Diff #351663)

This was only for the PoC.

Values pointed by cl::opts -- that is either the internal cl::opt data or the static pointed by cl::location -- should be part of a context of some sort. LLD now has CommonLinkerContext plus a derived context per-driver. We probably need a "context" class along the same lines for the LLVM libs. Should it be part of LLVMContextImpl? Some options are read a bit earlier, before the LLVMContext is created.
The cl::opts themselves can remain static.

Unless we assume cl::opts are just for debugging, and move to tablegen any such globals that affect multi-threaded compilation?

nhaehnle removed a subscriber: nhaehnle.May 11 2023, 9:43 AM