Hi @cameron.mcinally, I'm just sharing what I tried out today based
off your patch D80260. I'm not really planning to land it, but feel
free to use for reference or discard entirely if you've already been
working on something similar.
It passes the dup(0) to the zero-merging pseudos, similar to what D80260
does for any other mask value.
This patch also highlights a bug that currently exists with the expansion
of the pseudo instructions that merge zero's into the false lanes.
The zero-merging pseudos don't have any tied operand constraints to give
the register allocator more freedom to use the reverse instructions
A bug currently exists when the register allocation of one of the pseudos
ends up as:
Dst = FSUB_ZERO_S P0, Z0, Z0
The expand pass cannot zero the false lanes of Z0 using MOVPRFX, because
the MOVPRFX instruction specifies that the destination register must not
be used in any other operand position than the destination register. This
would not be valid:
Z0 = MOVPRFX P0/z, Z0 Z0 = FSUB_S Z0, P0/m, Z0 ^^
At point of expanding the pseudo, there may not be a spare register
available to expand this into a legal sequence. In D71712 we've solved
this by using a 'Conditional Early Clobber' pass that runs during register
allocation and makes sure the destination register is different from
any of the input registers, if the two input registers will otherwise end
up the same. This is a bit fiddly, and it's probably better to build
on the design set out in D80260 where the merge-value value is passed
to the pseudo, so the compiler can decide at point of pseudo expansion
whether to use the DUP(0) value, or to use the zeroing MOVPRFX.
Given that the DUP IMM instructions have isReMaterializable set,
the register allocator hopefully won't try too hard to keep it in a