When trying to enable the mischeduler on SystemZ, a regression was discovered which was due to how instructions using copys of physregs were scheduled.
The test case looks attached is the input to mischeduler. The observation here is that %R3D is copied into %vreg1, while %R3D is also being written to later from %vreg7. We we would want the ADJDYNALLOCs using %vreg1 to be scheduled before the one defining %vreg7, so that regalloc will be able to give %r3d to %1 and later to %7 without overlap.
The isel scheduler handled this by seing that the ADJDYNALLOCs using %1 has one more live range, which it wants to minimize, when comparing the ADJDYNALLOCs.
This patch adds a simple heuristic to tryCandidate() which handles this test case. It also seem to generally reduce the number of COPYs across benchmarks. Basically it extends the handling for physregs by looking at virtual registers which are defined by a single COPY of a physreg (%vreg1 in this example). A preference is given then for the SU that uses such a vreg in order to minimize that live-range.
I have tried different positions for this new heuristic in tryCandidate(). Anywhere above the RPDelta.CurrentMax check seems bad as it increases spilling. Then it seems to nearly make no difference at all where it is placed after this, so I put it as the last heuristic just before original instruction order.
A lot of test cases fail with this patch:
X86: 321
AMDGPU: 36
AArch64: 13
PowerPC, SPARC: 3 each
Lanai: 2
ARM: 1
So before going any further, I would like feedback on the feasability of inserting this heuristic in the mischeduler.
Is this the right handling of this test case in mischeduler (I assume it is a scheduling problem)?
This is a non-obvious rule and needs more comments and a better function name!