The abstractions in use for iterating over sequences of types are apparently slow. This is not easy to fix, but it basically boils down to abstraction overhead. The internal abstractions have several layers of pointer indirections and shared_ptr copies, which slows things down considerably considering this is in the hot path.
Going back to raw byte-level constructs speeds this up rather significantly, on the order of 10%-15% of total link time. The particular test case here went from 41.6s (pre-patch) to 36s (post-patch), for a solid 15% improvement.