The addition of inverse_throughput mode highlighted the disjointedness of snippet generators and benchmark runners because it used the UopsSnippetGenerator with the LatencyBenchmarkRunner. To keep the code consistent tie the snippet generators to parallelization/serialization rather than their benchmark runners.
Renaming LatencySnippetGenerator -> SerialSnippetGenerator.
Renaming UopsSnippetGenerator -> ParallelSnippetGenerator.
Renaming Uops -> Parallel in types and functions related to the ParallelSnippetGenerator.
I'm not sure how this can be reworded to reflect that it deals with parallel instructions in general rather than just the uops case.