Keeping the compile units in memory is expensive. For the single threaded case we allocate them in the analyze part and deallocate them again once we've finished cloning. This poses a problem in the single threaded case where we did all the analysis first followed by all the cloning. This meant we had all the link context in memory right after analyzing finished.
This patch changes the way we order work in the single threaded case. Instead of doing all the analysis and cloning in serial, we now interleave the two so we can deallocate the memory as soon as a file is processed. The result is binary identical and peak memory usage went down from 13.43GB to 5.73GB for a debug build of trunk clang.