Previously we mostly support .vv operands for intrinsics where
there was a .vv instruction. Othewise we only supported scalar
operands for the .vx and .vf instructions. The C interface defines
.vv intrinsics even when an instruction doesn't exist since the
operands can be swapped to use another instruction. We were going
to handle this in the frontend, but I think it makes the IR interface
confusing. So this patch proposes to handle all the .vv cases in
the backend by swapping operands where needed.
I've also added vmsge(u) intrinsics with support .vv, .vx, and .vi.
.vv will swap operands and use vmsle().vv. .vi will adjust the
immediate and use .vmsgt(u).vi. For .vx we need to use the
two multiple instruction sequences from the V extension spec.
For unmasked vmsge(u).vx we use
vmslt{u}.vx vd, va, x
For cases where mask and maskedoff are the same value then we have
vmsge{u}.vx v0, va, x, v0.t which is the vd==v0 case whichi
requires a temporary so we use:
vmslt{u}.vx vt, va, x; vmandnot.mm vd, vd, vt
For other masked cases we use this sequence.
vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0
We trust that register allocation will prevent vd in vmslt{u}.vx
from being v0 since v0 is still needed by the vmxor.
Note we end up using v9 as third operand of the vmxor instead of v0. But it's ok because we copied v9 to v0 before the vmslt.vx so v9 and v0 are the same value. Neither v9 or v0 were affected by the vmslt.vx.