Add a new constraint to express that not all operands of an instruction can be

assigned the same register. This is necessary for efficient use of SVE's MOVPRFX

as MOVPRFX has a restriction that its destination register may only appear as

the tied source of the following instruction. MOVPRFX is used for two purposes,

to turn a destructive operation into a non-destructive operation and to turn a

passthru-merging predicate operation into zero-merging predicate operation.

The constraint is needed when using MOVPRFX for the latter.

As an example, consider FDIV and FDIVR, where FDIVR is a "reverse" divide:

a = FDIV p/m, b, c # a[i] = p[i] ? b[i] / c[i] : a[i] a = FDIVR p/m, c, b # a[i] = p[i] ? b[i] / c[i] : a[i]

These are defined with tied constraints (simplified from existing

AArch64SVEInstrInfo.td):

let Constraints = "$Zdn = $_Zdn" in { // a[i] = p[i] ? b[i] / c[i] : a[i] defm FDIV_MERGE : I<(outs z:$Zdn), (ins p:$Pg, z:$_Zdn, z:$Zm)> // a[i] = p[i] ? b[i] / c[i] : a[i] defm FDIVR_MERGE : I<(outs z:$Zdn), (ins p:$Pg, z:$_Zdn, z:$Zm)> }

There are no separate zero-merging versions of these instructions. However,

we can define pseudo-instructions that act as zero-merging instructions:

let Constraints = "not_all_same($Zdn, $_Zdn, $Zm)" in { // a[i] = p[i] ? b[i] / c[i] : 0 defm FDIV_ZERO : I<(outs z:$Zdn), (ins p:$Pg, z:$_Zdn, z:$Zm)> // a[i] = p[i] ? b[i] / c[i] : 0 defm FDIVR_ZERO : I<(outs z:$Zdn), (ins p:$Pg, z:$_Zdn, z:$Zm)> }

Instruction selection can generate one of these:

vz0 = FDIV_ZERO pz1/z, vz2, vz3

If register allocation can't easily tie vz0 and vz2, it could produce this:

z0 = FDIV_ZERO p1, z2, z3

This is trivially expandable with MOVPRFX:

z0 = MOVPRFX p1/z, z2 # Both copies and makes FDIV_MERGE zero-merging z0 = FDIV_MERGE p1/m, z0, z3

Note that even if the register allocator could tie vz0 and vz2, we'd still need

the MOVPRFX to support zero-merging so there's no adtantage to using a tied

constraint.

But say instruction selection produces this:

vz0 = FDIV_ZERO pz1/z, vz2, vz2

Note that even though both sources use the same virtual register, we may not be

able to replace this instruction with "1.0" due to traps or IEEE conformance so

we can't jst delete it.

Let's say instead of not_all_same we'd specified the constraint as "$Zdn =

$_Zdn" (i.e. tied) or had no constraints at all. Register allocation could

produce this without any extra copies:

z0 = FDIV_ZERO p1, z0, z0 # vz2 was dead after this instruction

When we go to expand the pseudo we might naively do this:

z0 = MOVPRFX p1/z, z0 z0 = FDIV_MERGE p1/m, z0, z0

Except now we've violated the constraint on MOVPRFX that its destination can

only appear on the tied source operand and nowhere else. So we need an extra

MOV:

z1 = MOV z0 # Satisfy the MOVPRFX constraint z0 = MOVPRFX p1/z, z0 # Needed to make FDIV_MERGE zero-merging z0 = FDIV_MERGE p1/m, z0, z1

If we use the not_all_same constraint instead, register allocation can assign a

different register to vz0:

z1 = FDIV_ZERO p1, z0, z0

Now we can easily expand this without an extra MOV:

z1 = MOVPRFX p1/z, z0 # Both copies and makes FDIV_MERGE zero-merging z1 = FDIV_MERGE p1, z1, z0

Note than an @earlyclobber constraint will not save us because it would

pessimize the lowering of this:

vz0 = FDIV_ZERO vp1, vz2, vz3

If vz2 were dead after the FDIV_ZERO, we could allocate vz0 and vz2 to the same

register and trivially lower the FDIV_ZERO. If vz0 had an @earlyclobber

constraint, we would not be able to allocate vz0 and vz2 to the same register,

pessimizing register allocation. Similarly, if one of vz2 or vz3 were

@earlyclobber, then vz2 and vz3 could not be the same register, which would

require an extra MOV for our second example above.

Since neither the tied nor the @earlyclobber constraint can express the needed

semantics, we need a new constraint type that accurately expresses the register

allocation restrictions on these zero-merging pseudo-instructions, which is the

role that not_all_same plays.