There are a variety of cases where we want more control over the exact
instruction emitted. This commit creates a new pass to fixup
instructions after the DAG has been lowered. The pass is only meant to
replace instructions that are guranteed to be interchangable, not to
do analysis for special cases.
Handling these instruction changes in in X86ISelLowering of
X86ISelDAGToDAG isn't ideal, as its liable to either break existing
patterns that expected a certain instruction or generate infinite
loops.
Currently, only vpermilps -> vshufps/vshufd is implemented, but
more cases can be added.
Maybe rename to TuningNoDomainDelay/TuningNoDomainDelayShuffle - bypass could mean many things.....