Use separate doctrees between different Sphinx builders in order to
prevent race condition issues due to multiple Sphinx instances accessing
the same doctree cache in parallel.
Details
Diff Detail
Event Timeline
For the Ninja generator we could instead put the sphinx commands into a custom job pool and limit the pool size to 1. This would alleviate the need of a spurious dependency.
The simple Ninja solution is actually to just specify USES_TERMINAL on the target creation which will limit one at a time.
If we do that, then this dependency could be added only for non-ninja generators.
diff --git a/cmake/modules/AddSphinxTarget.cmake b/cmake/modules/AddSphinxTarget.cmake index 045dc23..fbe4f95 100644 --- a/cmake/modules/AddSphinxTarget.cmake +++ b/cmake/modules/AddSphinxTarget.cmake @@ -24,7 +24,8 @@ function (add_sphinx_target builder project) "${CMAKE_CURRENT_SOURCE_DIR}" # Source "${SPHINX_BUILD_DIR}" # Output COMMENT - "Generating ${builder} Sphinx documentation for ${project} into \"${SPHINX_BUILD_DIR}\"") + "Generating ${builder} Sphinx documentation for ${project} into \"${SPHINX_BUILD_DIR}\"" + USES_TERMINAL) # When "clean" target is run, remove the Sphinx build directory set_property(DIRECTORY APPEND PROPERTY
To be honest, this sounds like a bad hack to me. The command does not use terminal after all but instead you want to rely on a side effect of its current implementation in Ninja. I can imagine that in the future this could change, and cause hard-to-track issues.
I don't think using the console pool on Ninja is a bigger hack than making two targets that don't actually depend on each other have a false dependency. If Ninja's console pool were to ever have a size greater than 1 our build would be fundamentally broken in *many* ways. While I understand that it might seem like a hack, that behavior is really fundamentally important.
@thakis is our resident Ninja expert (and contributor), he might have opinions.
I agree that relying on the console pool is a bit hacky. It looks like cmake doesn't allow having custom pools though, and having an explicit dep needlessly constrains the build.
One thing we used to do in Chromium before we had ninja was to have a small python wrapper with a lock file for exclusiveness. Linux has http://linux.die.net/man/1/flock, so you could just run flock sphinx.lock <command> there, but that's linux only. Our mac implementation was
def ExecFlock(self, lockfile, *cmd_list): """Emulates the most basic behavior of Linux's flock(1).""" # Rely on exception handling to report errors. fd = os.open(lockfile, os.O_RDONLY|os.O_NOCTTY|os.O_CREAT, 0o666) fcntl.flock(fd, fcntl.LOCK_EX) return subprocess.call(cmd_list)
The lock file model seems like a good fit for sphinx, and it'd work with all of CMake's generators.
Well, there's also another option of using separate SPHINX_DOC_TREE_DIRs, i.e. make the two targets not share working directories and make them fully parallel-friendly.
@mgorny, that might be the best/simplest option. Does that have any performance drawbacks? Could you maybe time the difference between running the two targets in serial with your patch vs running them in parallel with different tree dirs?
I've done two cold cache runs on my old dual-core Athlon64. The results (both runs were ~ the same):
serial:
real 1m32.180s
user 1m17.122s
sys 0m0.797s
parallel:
real 1m30.775s
user 2m0.950s
sys 0m0.898s
If I'm reading the results right, both variants give similar results, though the parallel run takes ~50% CPU time more (via not using cache from previous run).
Looking into what Sphinx does, the part of 'reading sources' (where cache is generated) takes the most time, writing HTML files is the second and writing manpages takes very little time. So while the most efficient solution would be to actually lock the doctree in Sphinx, therefore forcing its creation in serial and then create both file types in parallel, the manpage target takes so little time that it probably doesn't matter.
I've reported bug #2946 about it on Sphinx end, and attempted to solve it but found no sane way of doing that with the current code design of Sphinx. I'll update this for the requested Ninja hack then.
@beanz, one more thing just occurred to me wrt USES_TERMINAL. This forces serialization globally while it is only necessary per-docroot (i.e. per CMAKE_CURRENT_BINARY_DIR).
So, with the dependency solution you can e.g. build llvm & clang docs in parallel, while the USES_TERMINAL solution will unnecessarily serialize that.
Could you then reconsider my original patch?
@mgorny, I'd rather we go with the parallel solution. While it does use more CPU time, it doesn't adversely impact overall performance, and it doesn't artificially constrain the build graph.
Are you sure about that? It doesn't impact performance much on dual-core system but on a single core it will run twice as long.
Why are you concerned about the build graph? I still think it's kinda logical, though the dependency is more on a side effect of running one of those commands (-> cache file) rather than the actual result. I don't know if that could be expressed more gracefully in CMake.
I don't think it is worth optimizing for single core machines. Multi-core CPUs are the norm these days, so I would prefer not to optimize for them.
Why are you concerned about the build graph?
Having a non-constrained build graph allows multi-core CPUs to better utilize their resources.
An alternative solution would be to use Sphinx's DummyBuilder to generate the doctree. It is a builder that produces no output, so it should suffice as a way to generate the doctree which could then be shared between the other documentation builds.
Tell that to the person who is running Gentoo on old x86. Because most of binary distros don't even work on CPU that old ;-).
Why are you concerned about the build graph?
Having a non-constrained build graph allows multi-core CPUs to better utilize their resources.
I disagree here. We're talking about losing a major cache opportunity. We lose the opportunity to run twice as many Sphinx instances in parallel but the free cores are better utilized compiling objects at the time. If I read the timing right, I would guess around 75% of the CPU time is consumed on building the doctree.
Besides, Sphinx already does parallel jobs internally.
An alternative solution would be to use Sphinx's DummyBuilder to generate the doctree. It is a builder that produces no output, so it should suffice as a way to generate the doctree which could then be shared between the other documentation builds.
That would be nice but I should point out that the 'dummy' builder is not available as separate builder in sphinx-1.3.5 (which is the Gentoo current for some reason). It seems that upstream split it out 3 months ago, so this would mean restricting LLVM to very fresh Sphinx versions.
I care very little about minor impacts on downstream consumers of source distributions. I care way more about people who are building LLVM regularly because it impacts us every day. Further, I very strongly believe that we need to build our software thinking about the future, and MP is the present, and the future. Single core CPUs are becoming less and less common every day. Optimizing for them doesn't seem prudent to me.
I disagree here. We're talking about losing a major cache opportunity. We lose the opportunity to run twice as many Sphinx instances in parallel but the free cores are better utilized compiling objects at the time. If I read the timing right, I would guess around 75% of the CPU time is consumed on building the doctree.
Besides, Sphinx already does parallel jobs internally.
This is just an argument for why my earlier suggestion of USES_TERMINAL would not be a bad fit. If sphinx significantly used multi-threading you wouldn't want more than one instance of it running at a time. While I know it has multiple threads, based on your single invocation or parallel time tests, I suspect they aren't CPU bound.
That would be nice but I should point out that the 'dummy' builder is not available as separate builder in sphinx-1.3.5 (which is the Gentoo current for some reason). It seems that upstream split it out 3 months ago, so this would mean restricting LLVM to very fresh Sphinx versions.
I didn't realize the dummy builder was that new. Requiring a very new sphinx is not reasonable.
I'm more concerned about processing the same data multiple times while we have an opportunity to do it once and cache the result. While I can agree that my original method was hacky, it seemed to provide the best performance for everyone.
Anyway, I don't think we're going to reach an agreement here, so I've updated the patch to safely run Sphinx in parallel like you last requested (I think). In most of the cases, the performance won't change, only now two doctrees will be created rather than one being created and then overwritten.