This patch adds an experimental stage named MicroOpQueueStage.
A MicroOpQueueStage can be used to simulate a hardware micro-op queue (basically, a decoupling queue between 'decode' and 'dispatch').
Users can specify a queue size, as well as a optional MaxIPC (which - in the absence of a "Decoders" stage - can be used to simulate a different throughput from the decoders).
This stage is added to the default pipeline between the EntryStage and the DispatchStage only if PipelineOption::MicroOpQueue is different than zero. By default, llvm-mca sets PipelineOption::MicroOpQueue to the value of hidden flag -micro-op-queue-size.
Throughput from the decoder can be simulated via another hidden flag named -decoder-throughput.
That flag allows us to quickly experiment with different frontend throughputs. For targets that declare a loop buffer, flag -decoder-throughput allows users to do multiple runs, each time simulating a different throughput from the decoders.
This stage can/will be extended in future. For example, we could add a "buffer full" events to identify bottlenecks caused by backpressure. flag -decoder-throughput would probably go away if in future we delegate to another stage (DecoderStage?) the simulation of a (potentially variable) throughput from the decoders.
For now, flag -decoder-throughput is "good enough" to run some simple experiments.
Let me know if okay to commit.
-Andrea
Maybe add a comment: // Instructions per cycle.