Instead of trying to "fix" the original driver invocation by appending arguments to it, split it into multiple commands, and for each -cc1 command use a CompilerInvocation to give precise control over the invocation.
This change should make it easier to (in the future) canonicalize the command-line (e.g. to improve hits in something like ccache), apply optimizations, or start supporting multi-arch builds, which would require different modules for each arch.
In the long run it may make sense to treat the TU commands as a dependency graph, each with their own dependencies on modules or earlier TU commands, but for now they are simply a list that is executed in order, and the dependencies are simply duplicated. Since we currently only support single-arch builds, there is no parallelism available in the execution.
Have you considered using the Job/Command classes the driver uses? What are the downsides of doing that?