This patch contains following improvements to MachineCombiner's cost model.
1 Ignore coalescable COPY instructions in computing latency because it will be deleted after RA. 2 When computing CycleCount, use (FirstDepth + RootLatency) instead of (RootDepth + RootLatency) to avoid double counting of instructions when there are multiple instructions in either the old or new instruction sequence.