Add a new constraint to express that not all operands of an instruction can be
assigned the same register. This is necessary for efficient use of SVE's MOVPRFX
as MOVPRFX has a restriction that its destination register may only appear as
the tied source of the following instruction. MOVPRFX is used for two purposes,
to turn a destructive operation into a non-destructive operation and to turn a
passthru-merging predicate operation into zero-merging predicate operation.
The constraint is needed when using MOVPRFX for the latter.
As an example, consider FDIV and FDIVR, where FDIVR is a "reverse" divide:
a = FDIV p/m, b, c # a[i] = p[i] ? b[i] / c[i] : a[i] a = FDIVR p/m, c, b # a[i] = p[i] ? b[i] / c[i] : a[i]
These are defined with tied constraints (simplified from existing
AArch64SVEInstrInfo.td):
let Constraints = "$Zdn = $_Zdn" in { // a[i] = p[i] ? b[i] / c[i] : a[i] defm FDIV_MERGE : I<(outs z:$Zdn), (ins p:$Pg, z:$_Zdn, z:$Zm)> // a[i] = p[i] ? b[i] / c[i] : a[i] defm FDIVR_MERGE : I<(outs z:$Zdn), (ins p:$Pg, z:$_Zdn, z:$Zm)> }
There are no separate zero-merging versions of these instructions. However,
we can define pseudo-instructions that act as zero-merging instructions:
let Constraints = "not_all_same($Zdn, $_Zdn, $Zm)" in { // a[i] = p[i] ? b[i] / c[i] : 0 defm FDIV_ZERO : I<(outs z:$Zdn), (ins p:$Pg, z:$_Zdn, z:$Zm)> // a[i] = p[i] ? b[i] / c[i] : 0 defm FDIVR_ZERO : I<(outs z:$Zdn), (ins p:$Pg, z:$_Zdn, z:$Zm)> }
Instruction selection can generate one of these:
vz0 = FDIV_ZERO pz1/z, vz2, vz3
If register allocation can't easily tie vz0 and vz2, it could produce this:
z0 = FDIV_ZERO p1, z2, z3
This is trivially expandable with MOVPRFX:
z0 = MOVPRFX p1/z, z2 # Both copies and makes FDIV_MERGE zero-merging z0 = FDIV_MERGE p1/m, z0, z3
Note that even if the register allocator could tie vz0 and vz2, we'd still need
the MOVPRFX to support zero-merging so there's no adtantage to using a tied
constraint.
But say instruction selection produces this:
vz0 = FDIV_ZERO pz1/z, vz2, vz2
Note that even though both sources use the same virtual register, we may not be
able to replace this instruction with "1.0" due to traps or IEEE conformance so
we can't jst delete it.
Let's say instead of not_all_same we'd specified the constraint as "$Zdn =
$_Zdn" (i.e. tied) or had no constraints at all. Register allocation could
produce this without any extra copies:
z0 = FDIV_ZERO p1, z0, z0 # vz2 was dead after this instruction
When we go to expand the pseudo we might naively do this:
z0 = MOVPRFX p1/z, z0 z0 = FDIV_MERGE p1/m, z0, z0
Except now we've violated the constraint on MOVPRFX that its destination can
only appear on the tied source operand and nowhere else. So we need an extra
MOV:
z1 = MOV z0 # Satisfy the MOVPRFX constraint z0 = MOVPRFX p1/z, z0 # Needed to make FDIV_MERGE zero-merging z0 = FDIV_MERGE p1/m, z0, z1
If we use the not_all_same constraint instead, register allocation can assign a
different register to vz0:
z1 = FDIV_ZERO p1, z0, z0
Now we can easily expand this without an extra MOV:
z1 = MOVPRFX p1/z, z0 # Both copies and makes FDIV_MERGE zero-merging z1 = FDIV_MERGE p1, z1, z0
Note than an @earlyclobber constraint will not save us because it would
pessimize the lowering of this:
vz0 = FDIV_ZERO vp1, vz2, vz3
If vz2 were dead after the FDIV_ZERO, we could allocate vz0 and vz2 to the same
register and trivially lower the FDIV_ZERO. If vz0 had an @earlyclobber
constraint, we would not be able to allocate vz0 and vz2 to the same register,
pessimizing register allocation. Similarly, if one of vz2 or vz3 were
@earlyclobber, then vz2 and vz3 could not be the same register, which would
require an extra MOV for our second example above.
Since neither the tied nor the @earlyclobber constraint can express the needed
semantics, we need a new constraint type that accurately expresses the register
allocation restrictions on these zero-merging pseudo-instructions, which is the
role that not_all_same plays.
clang-format: please reformat the code