I got sufficiently annoyed with sitting around waiting forever for TableGen to complete during a build, so I profiled it. While there's quite a bit more that can be done, this is the really easy stuff. Tons of string allocations and re-allocations, STL stringstream usage, and STL set container usage.
With this patch, X86GenDAGISel is reduced from ~45 seconds to ~33 seconds on my machine, which is a ~1.3x speedup.
The next biggest bottleneck is the class that indents and formats the output. Given that tablegen output is really only intended to be consumed by a machine, it makes sense to question whether or not it's worth the performance cost. I removed the formatting and my speedup jumps to a 2x speedup (44s -> 22s).
After I get the single-threaded execution time sufficiently fast, I might even try to parallelize it. It seems like a terribly obvious and easy candidate for parallelization given that we now have simple parallel algorithms in Support.
Feel free to add other reviewers if there's someone better.
What made us need this?