I got sufficiently annoyed with sitting around waiting forever for TableGen to complete during a build, so I profiled it. While there's quite a bit more that can be done, this is the really easy stuff. Tons of string allocations and re-allocations, STL stringstream usage, and STL set container usage.
With this patch, X86GenDAGISel is reduced from ~45 seconds to ~33 seconds on my machine, which is a ~1.3x speedup.
The next biggest bottleneck is the class that indents and formats the output. Given that tablegen output is really only intended to be consumed by a machine, it makes sense to question whether or not it's worth the performance cost. I removed the formatting and my speedup jumps to a 2x speedup (44s -> 22s).
After I get the single-threaded execution time sufficiently fast, I might even try to parallelize it. It seems like a terribly obvious and easy candidate for parallelization given that we now have simple parallel algorithms in Support.
Feel free to add other reviewers if there's someone better.