This new class, PipelineEmitter, is abstract and allows custom code generators to plug into MachinePipeliner. A single implementation is provided, SinglEpilogPipelineEmitter, that is equivalent to the current experimental code generator.
The name of this class serves to distinguish it from other code generators that will appear in the future. Quoting from the header:
/ PipelinerEmitter implementation that produces prolog blocks linked to a
/ single shared set of epilog blocks. This reduces the number of epilog blocks
/ required, but assumes that it is correct to emit stages in non-FIFO order:
/
/ Prolog0 -- executes A[0]
/ | \
/ Prolog1 \ | executes A[1], B[0]
/ | \ |
/ Prolog2 \ || executes A[2], B[1], C[0]
/ | |||
/ Kernel ||| executes X[3], Y[2], Z[1], W[0]
/ | / ||
/ Epilog0 || executes (A or Y)[3]
/ | / |
/ Epilog1 / executes (A or Z)[2, 3]
/ | /
/ Epilog2 executes (A or W)[1, 2, 3]
/
/ Note that assuming the kernel is taken the epilog order is:
/ Y[3], Z[2], Z[3], W[1], W[2], W[3]
/ Whereas a FIFO order for the epilog would be:
/ Y[3], Z[2], W[1], Z[3], W[2], W[3]
/
/ For many targets this ordering does not matter. However, if a hardware
/ functional unit is being used that depends on FIFO order (for example a
/ streaming hardware unit), wrong code may be generated.