Now, move constant zero is lowered into two MIRs after instruction selection
v1 = copy wzr/xzr v2 = copy v1
These two copies are coalesced in a later pass.
One problem of this is in Machine-Sink pass which runs before the copy propogation pass. Machine-sink can break a critical edge if at least two cheap MIRs can be sinked to that path. Thus, we may have a MBB which has only one mov wzr/xzr instruction. This can make block placement difficult to do the layout. For example, the test case below, copy-zero-reg.ll, has a loop unrolled by two. Sinking the mov wzr/xzr makes it impossible to find a fallthrough for every MBB and the currently generated code has a block looks like this
// BB#1: mov w9, wzr cbnz w8, .LBB0_5 b .LBB0_6
This patch coalesce two COPYs during instruction selection. Below is the performance impacted by this patch