This is a compile time optimization: keeping a large file to process
at the end hurts parallelism.
The heurisitic used right now is the input buffer size, however we
may want to consider the number of functions to import or the
different number of files to load for importing as well.
(port from ThinLTOCodeGenerator.cpp)
Details
- Reviewers
tejohnson
Diff Detail
Event Timeline
Code changes look fine. But what is the impact on memory? I wonder if this will bloat the peak memory because more of the large modules will get run in parallel.
The peak memory when linking clang with 4 threads goes up from 1.65GB to 2.22GB with this patch.
This makes me wonder if we could have a smarter scheduler that would balance the inputs to interleave the large ones with as many small one as needed.
Maybe another concern could be a smart scheduler for "locality" (if A imports from B and B imports from A, schedule them back-to-back as the files are more likely to be mapped in memory). But this seems quite secondary.
This patch was motivated by a use case (llvm-tblgen maybe?) where a single large file was taking almost as long as *all* the others to process and was scheduled late. The link time improvement was really important in this case.