This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
1
AArch64SVEInstrInfo.td
1/1
SVEInstrFormats.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-fp-pred.ll

Differential D71432

[AArch64][SVE] Proposal to use op+select to match scalable predicated operations
AbandonedPublic

Authored by cameron.mcinally on Dec 12 2019, 1:15 PM.

Download Raw Diff

Details

Reviewers

sdesmalen
c-rhodes
rovka
rengolin
eli.friedman
Andrzej
kmclaughlin
steleman
efriedma

Summary

As promised from the SVE LLVM Sync-up today, here is a proposal to support scalable op+select masks in the backend. A solution like this would allow us to generate predicated instructions directly from IR, without the need for target intrinsics.

Also note that this is a temporary solution. The native vector predication project D57504 will eventually obsolete this work.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

cameron.mcinally created this revision.Dec 12 2019, 1:15 PM

Herald added a reviewer: efriedma. · View Herald TranscriptDec 12 2019, 1:15 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, psnobl, rkruppe and 3 others. · View Herald Transcript

Remove some leftover debugging junk...

Adding patterns for vselect of various operations seems reasonable in general. The patterns are simple enough that it's not a big deal to repeat for a bunch of instructions.

For floating-point ops in particular, I'm sort of wondering how this interacts with STRICT_* operations. I think these patterns should not match in that case? We'd be suppressing exceptions that would otherwise trigger. Not sure how important that is.

llvm/lib/Target/AArch64/SVEInstrFormats.td
1284	If we're going to stick this style of pattern all over the place, we probably want a "class" for it.

In D71432#1782606, @efriedma wrote:

Adding patterns for vselect of various operations seems reasonable in general. The patterns are simple enough that it's not a big deal to repeat for a bunch of instructions.

I'm sure you know, but for others, this is how AVX512 handles predication. We originally began adding predicated intrinsics for every AVX512 instruction, but they were later replaced with op+select patterns (except for masked loads/stores/gathers/scatters and a few others). Although, I'll admit that this isn't an apples-to-apples comparison, since AVX512 has 5000+ predicated instructions. I don't know the SVE predicated instruction count, but my intuition says it's much less (please correct me if I'm wrong).

For floating-point ops in particular, I'm sort of wondering how this interacts with STRICT_* operations. I think these patterns should not match in that case? We'd be suppressing exceptions that would otherwise trigger. Not sure how important that is.

It's actually the other way around. Vectorization would need the selects in place to suppress exceptions guarded by conditions. E.g.:

for(...)
  if (b[i] != 0)
    a[i] = a[i]/b[i];

My current understanding (and I could be wrong) is that the native predication intrinsics and the constrained intrinsics will be merge into one set. So once we have native predication + constrained intrinsics, these op+select patterns can go away on all targets. The op+select patterns are just a stop-gap.

My current understanding (and I could be wrong) is that the native predication intrinsics and the constrained intrinsics will be merge into one set

My understanding is that we're going to get masked constrained intrinsics, yes. Not sure if that means the other versions are going away completely; we still probably want some way to represent vector operations on targets without predication, at least in SelectionDAG.

In the meantime, we have to generate all the exceptions implied by a strict vector operation. We can't change the exceptions based on the uses of the instruction. Consider the following:

`
for(...)
  double c = a[i]/b[i];
  if (b[i] != 0)
    a[i] = c;
`

Ah, ok. I see your point now.

for(...)
  double c = a[i]/b[i];
  if (b[i] != 0)
    a[i] = c;
`

For this particular case, the select should follow the store though, not the fdiv. That should be ok. The store should become a masked store and the fdiv would be unpredicated.

rscottmanley added a subscriber: rscottmanley.Dec 13 2019, 8:24 AM

Add a class for the op+select patterns.

cameron.mcinally marked an inline comment as done.Dec 13 2019, 12:56 PM

simoll added a subscriber: simoll.Dec 16 2019, 12:30 AM

Hi @cameron.mcinally, thanks for sharing this patch!

For the purpose of merging a select with an unpredicated operation into a predicated operation, this is indeed sufficient. But I wonder if we need something a bit more elaborate if the intended purpose is to more cheaply select a value for the false-lanes (passthru).

While we don't support the generic case in our downstream compiler, we do have special support for the cases where the false lanes are zeroed or undef. Using the predicated MOVPRFX instruction, the false lanes can be zeroed relatively cheaply using e.g.:

movprfx z0.s, p0/z, z1.s
fsub z0.s, p0/m, z2.s

This avoids having to emit an explicit sequence of a splat and select / predicated mov to zero the false lanes. We match the operation + select into a Pseudo instruction (e.g. FSUB_ZERO or FSUB_UNDEF), that is expanded after register allocation (in the AArch64ExpandPseudoInsts pass) into the appropriate instructions.

Even if we don't care about selecting a passthru value for the false lanes, there is still value in creating the Pseudo. The lack of a tied-operand constraint for the Pseudo gives the register allocator more freedom to come up with a better allocation. Combined with the commutative property of some instructions or by expanding to their reversed variants (like SUBR vs SUB), we can avoid a number of unnecessary register moves.

We've been thinking about some ideas on how to make this support more generic to allow supporting the general use-case of:

%Res = FSUB_PSEUDO(%Pred, %Op1, %Op2, %Passthru)

Depending on the value for %Passthru, this can be expanded to use a movprfx or in the worst case an explicit select.

Ideally we'd use a Pseudo for most operations so that we can use this as a generic mechanism that natively supports the passthru value and benefits from better register allocation.

A bit of prototyping would be required though, as our downstream compiler only covers a limited use-case. We've also had to deal with some corner-cases, but I'd need to refresh my memory on the details of those before I can comment on those. I'll try to dig up some more details!

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
163–166	nit: the `_pred` isn't needed, as this is already implied by the `_p`.

Matt added a subscriber: Matt.Dec 18 2019, 12:57 PM

In D71432#1788807, @sdesmalen wrote:
While we don't support the generic case in our downstream compiler, we do have special support for the cases where the false lanes are zeroed or undef. Using the predicated MOVPRFX instruction, the false lanes can be zeroed relatively cheaply using e.g.:
movprfx z0.s, p0/z, z1.s
fsub z0.s, p0/m, z2.s
This avoids having to emit an explicit sequence of a splat and select / predicated mov to zero the false lanes. We match the operation + select into a Pseudo instruction (e.g. FSUB_ZERO or FSUB_UNDEF), that is expanded after register allocation (in the AArch64ExpandPseudoInsts pass) into the appropriate instructions.

Even if we don't care about selecting a passthru value for the false lanes, there is still value in creating the Pseudo. The lack of a tied-operand constraint for the Pseudo gives the register allocator more freedom to come up with a better allocation. Combined with the commutative property of some instructions or by expanding to their reversed variants (like SUBR vs SUB), we can avoid a number of unnecessary register moves.

We've been thinking about some ideas on how to make this support more generic to allow supporting the general use-case of:
%Res = FSUB_PSEUDO(%Pred, %Op1, %Op2, %Passthru)
Depending on the value for %Passthru, this can be expanded to use a movprfx or in the worst case an explicit select.

Ideally we'd use a Pseudo for most operations so that we can use this as a generic mechanism that natively supports the passthru value and benefits from better register allocation.

A bit of prototyping would be required though, as our downstream compiler only covers a limited use-case. We've also had to deal with some corner-cases, but I'd need to refresh my memory on the details of those before I can comment on those. I'll try to dig up some more details!

Sorry for the slow reply... holidays.

I just checked out the PseudoOps and they're interesting. If we could make generic XXX_PSEUDO(..., %passthru) PseudoOps, that would be a good path forward. I don't foresee any problems adding an extra passthru operand to the existing patterns, but maybe I'm missing something. If you have any insight, it would be appreciated.

I agree that if these PseudoOps will land upstream, then the op+select solution isn't the right way to go. I'll see if I can build the PseudoOps out a bit.

That said, the current implementation is a little weird though. Here's the class that something like an FADD_ZERO would use:

class SVE_3_Op_Pat_SelZero<ValueType vtd, SDPatternOperator op, ValueType vt1,
                   ValueType vt2, ValueType vt3, Instruction inst>
: Pat<(vtd (vtd (op vt1:$Op1, (vselect vt1:$Op1, vt2:$Op2, (SVEDup0)), vt3:$Op3))),
      (inst $Op1, $Op2, $Op3)>;

The select+op DAG reads like we're zeroing the inactive input elements, not the inactive output elements. I see that this directly models the movprfx+op hardware instructions, and that Op2 is an input reg as well as the destination reg, but it still seems counter-intuitive. I don't feel strongly that this needs to change though. So if I'm the only one that thinks it's weird, I'll let it drop.

In D71432#1803402, @cameron.mcinally wrote:

I just checked out the PseudoOps and they're interesting. If we could make generic XXX_PSEUDO(..., %passthru) PseudoOps, that would be a good path forward. I don't foresee any problems adding an extra passthru operand to the existing patterns, but maybe I'm missing something. If you have any insight, it would be appreciated.

Awesome, thanks for checking this out.

I don't see many issues with adding the extra passthru operand, but it could do with a bit of prototyping; what we've done downstream was very specific to having the false lanes zero'd and that is actually unnecessarily restrictive (but we never really had to care for the generic case). The challenge would be in expanding the pseudo. For some pseudo for that has a reverse variant, e.g. Dst = fsub_pseudo(Pg, Op1, Op2, Passthru)

we would for example expand this as follows:

z0 = fsub_pseudo_S(p0, z1, z2, z3)
  <=>
sel z0.s, p0.s, z1.s, z3.s
fsub z0.s, p0/m, z2.s

z0 = fsub_pseudo_S(p0, z1, z2, z2)
  <=>
movprfx z0, z2
fsubr z2.s, p0/m, z1.s

or a special case for zero'd lanes:

z0 = fsub_pseudo_S(p0, z1, z2, <zero>)
movprfx z0, p0/z, z1
fsub z0.s, p0/m, z2.s

The indexed FMLA instruction has three input operands and this is a case where we can't relaxation the register constraints:

%dst = Pseudo_FMLA %a, %n, %m, %index
(to implement: %dst = FMLA %a + %n * %m[%index])

We cannot recover from a register allocation where %index is used as %dst, e.g.

Z0 = Pseudo_FMLA Z1, Z2, Z0, <index>

(there is no other variant we can use to recover from this, so those will need to retain a constraint that $Zd = $Zs1)

I agree that if these PseudoOps will land upstream, then the op+select solution isn't the right way to go. I'll see if I can build the PseudoOps out a bit.

That said, the current implementation is a little weird though. Here's the class that something like an FADD_ZERO would use:
class SVE_3_Op_Pat_SelZero<ValueType vtd, SDPatternOperator op, ValueType vt1,
                   ValueType vt2, ValueType vt3, Instruction inst>
: Pat<(vtd (vtd (op vt1:$Op1, (vselect vt1:$Op1, vt2:$Op2, (SVEDup0)), vt3:$Op3))),
      (inst $Op1, $Op2, $Op3)>;
The select+op DAG reads like we're zeroing the inactive input elements, not the inactive output elements. I see that this directly models the movprfx+op hardware instructions, and that Op2 is an input reg as well as the destination reg, but it still seems counter-intuitive. I don't feel strongly that this needs to change though. So if I'm the only one that thinks it's weird, I'll let it drop.

Yes, that's mostly because of the pattern that the front-end generates for zeroing the false lanes, but it is functionally the same. I guess we could have two patterns to map to the pseudo, depending on how it looks in the IR.

In D71432#1808202, @sdesmalen wrote:
In D71432#1803402, @cameron.mcinally wrote:

I just checked out the PseudoOps and they're interesting. If we could make generic XXX_PSEUDO(..., %passthru) PseudoOps, that would be a good path forward. I don't foresee any problems adding an extra passthru operand to the existing patterns, but maybe I'm missing something. If you have any insight, it would be appreciated.

Awesome, thanks for checking this out.

I don't see many issues with adding the extra passthru operand, but it could do with a bit of prototyping; what we've done downstream was very specific to having the false lanes zero'd and that is actually unnecessarily restrictive (but we never really had to care for the generic case). The challenge would be in expanding the pseudo. For some pseudo for that has a reverse variant, e.g. Dst = fsub_pseudo(Pg, Op1, Op2, Passthru)

we would for example expand this as follows:
z0 = fsub_pseudo_S(p0, z1, z2, z3)
  <=>
sel z0.s, p0.s, z1.s, z3.s
fsub z0.s, p0/m, z2.s

z0 = fsub_pseudo_S(p0, z1, z2, z2)
  <=>
movprfx z0, z2
fsubr z2.s, p0/m, z1.s

or a special case for zero'd lanes:

z0 = fsub_pseudo_S(p0, z1, z2, <zero>)
movprfx z0, p0/z, z1
fsub z0.s, p0/m, z2.s
The indexed FMLA instruction has three input operands and this is a case where we can't relaxation the register constraints:
%dst = Pseudo_FMLA %a, %n, %m, %index
(to implement: %dst = FMLA %a + %n * %m[%index])
We cannot recover from a register allocation where %index is used as %dst, e.g.
Z0 = Pseudo_FMLA Z1, Z2, Z0, <index>
(there is no other variant we can use to recover from this, so those will need to retain a constraint that $Zd = $Zs1)

Ok. thanks. I'll abandon this Diff and start building my intuition around the Pseudos. Will post some Diffs if I can make something work...

I agree that if these PseudoOps will land upstream, then the op+select solution isn't the right way to go. I'll see if I can build the PseudoOps out a bit.

That said, the current implementation is a little weird though. Here's the class that something like an FADD_ZERO would use:
class SVE_3_Op_Pat_SelZero<ValueType vtd, SDPatternOperator op, ValueType vt1,
                   ValueType vt2, ValueType vt3, Instruction inst>
: Pat<(vtd (vtd (op vt1:$Op1, (vselect vt1:$Op1, vt2:$Op2, (SVEDup0)), vt3:$Op3))),
      (inst $Op1, $Op2, $Op3)>;
The select+op DAG reads like we're zeroing the inactive input elements, not the inactive output elements. I see that this directly models the movprfx+op hardware instructions, and that Op2 is an input reg as well as the destination reg, but it still seems counter-intuitive. I don't feel strongly that this needs to change though. So if I'm the only one that thinks it's weird, I'll let it drop.
Yes, that's mostly because of the pattern that the front-end generates for zeroing the false lanes, but it is functionally the same. I guess we could have two patterns to map to the pseudo, depending on how it looks in the IR.

The existing patterns should be fine. Just a personal preference. No big deal.

pengfei added a subscriber: pengfei.Jan 31 2020, 5:22 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64SVEInstrInfo.td

8 lines

SVEInstrFormats.td

14 lines

test/

CodeGen/

AArch64/

sve-fp-pred.ll

134 lines

Diff 233665

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	let Predicates = [HasSVE] in {
defm FSUB_ZPmI : sve_fp_2op_i_p_zds<0b001, "fsub", sve_fpimm_half_one>;		defm FSUB_ZPmI : sve_fp_2op_i_p_zds<0b001, "fsub", sve_fpimm_half_one>;
defm FMUL_ZPmI : sve_fp_2op_i_p_zds<0b010, "fmul", sve_fpimm_half_two>;		defm FMUL_ZPmI : sve_fp_2op_i_p_zds<0b010, "fmul", sve_fpimm_half_two>;
defm FSUBR_ZPmI : sve_fp_2op_i_p_zds<0b011, "fsubr", sve_fpimm_half_one>;		defm FSUBR_ZPmI : sve_fp_2op_i_p_zds<0b011, "fsubr", sve_fpimm_half_one>;
defm FMAXNM_ZPmI : sve_fp_2op_i_p_zds<0b100, "fmaxnm", sve_fpimm_zero_one>;		defm FMAXNM_ZPmI : sve_fp_2op_i_p_zds<0b100, "fmaxnm", sve_fpimm_zero_one>;
defm FMINNM_ZPmI : sve_fp_2op_i_p_zds<0b101, "fminnm", sve_fpimm_zero_one>;		defm FMINNM_ZPmI : sve_fp_2op_i_p_zds<0b101, "fminnm", sve_fpimm_zero_one>;
defm FMAX_ZPmI : sve_fp_2op_i_p_zds<0b110, "fmax", sve_fpimm_zero_one>;		defm FMAX_ZPmI : sve_fp_2op_i_p_zds<0b110, "fmax", sve_fpimm_zero_one>;
defm FMIN_ZPmI : sve_fp_2op_i_p_zds<0b111, "fmin", sve_fpimm_zero_one>;		defm FMIN_ZPmI : sve_fp_2op_i_p_zds<0b111, "fmin", sve_fpimm_zero_one>;

defm FADD_ZPmZ : sve_fp_2op_p_zds<0b0000, "fadd", int_aarch64_sve_fadd>;		defm FADD_ZPmZ : sve_fp_2op_p_zds_pred<0b0000, "fadd", fadd, int_aarch64_sve_fadd>;
defm FSUB_ZPmZ : sve_fp_2op_p_zds<0b0001, "fsub", int_aarch64_sve_fsub>;		defm FSUB_ZPmZ : sve_fp_2op_p_zds_pred<0b0001, "fsub", fsub, int_aarch64_sve_fsub>;
defm FMUL_ZPmZ : sve_fp_2op_p_zds<0b0010, "fmul", int_aarch64_sve_fmul>;		defm FMUL_ZPmZ : sve_fp_2op_p_zds_pred<0b0010, "fmul", fmul, int_aarch64_sve_fmul>;
		defm FDIV_ZPmZ : sve_fp_2op_p_zds_pred<0b1101, "fdiv", fdiv, int_aarch64_sve_fdiv>;
		sdesmalenUnsubmitted Not Done Reply Inline Actions nit: the `_pred` isn't needed, as this is already implied by the `_p`. sdesmalen: nit: the `_pred` isn't needed, as this is already implied by the `_p`.
defm FSUBR_ZPmZ : sve_fp_2op_p_zds<0b0011, "fsubr", int_aarch64_sve_fsubr>;		defm FSUBR_ZPmZ : sve_fp_2op_p_zds<0b0011, "fsubr", int_aarch64_sve_fsubr>;
defm FMAXNM_ZPmZ : sve_fp_2op_p_zds<0b0100, "fmaxnm", int_aarch64_sve_fmaxnm>;		defm FMAXNM_ZPmZ : sve_fp_2op_p_zds<0b0100, "fmaxnm", int_aarch64_sve_fmaxnm>;
defm FMINNM_ZPmZ : sve_fp_2op_p_zds<0b0101, "fminnm", int_aarch64_sve_fminnm>;		defm FMINNM_ZPmZ : sve_fp_2op_p_zds<0b0101, "fminnm", int_aarch64_sve_fminnm>;
defm FMAX_ZPmZ : sve_fp_2op_p_zds<0b0110, "fmax", int_aarch64_sve_fmax>;		defm FMAX_ZPmZ : sve_fp_2op_p_zds<0b0110, "fmax", int_aarch64_sve_fmax>;
defm FMIN_ZPmZ : sve_fp_2op_p_zds<0b0111, "fmin", int_aarch64_sve_fmin>;		defm FMIN_ZPmZ : sve_fp_2op_p_zds<0b0111, "fmin", int_aarch64_sve_fmin>;
defm FABD_ZPmZ : sve_fp_2op_p_zds<0b1000, "fabd", int_aarch64_sve_fabd>;		defm FABD_ZPmZ : sve_fp_2op_p_zds<0b1000, "fabd", int_aarch64_sve_fabd>;
defm FSCALE_ZPmZ : sve_fp_2op_p_zds_fscale<0b1001, "fscale", int_aarch64_sve_fscale>;		defm FSCALE_ZPmZ : sve_fp_2op_p_zds_fscale<0b1001, "fscale", int_aarch64_sve_fscale>;
defm FMULX_ZPmZ : sve_fp_2op_p_zds<0b1010, "fmulx", int_aarch64_sve_fmulx>;		defm FMULX_ZPmZ : sve_fp_2op_p_zds<0b1010, "fmulx", int_aarch64_sve_fmulx>;
defm FDIVR_ZPmZ : sve_fp_2op_p_zds<0b1100, "fdivr", int_aarch64_sve_fdivr>;		defm FDIVR_ZPmZ : sve_fp_2op_p_zds<0b1100, "fdivr", int_aarch64_sve_fdivr>;
defm FDIV_ZPmZ : sve_fp_2op_p_zds<0b1101, "fdiv", int_aarch64_sve_fdiv>;

defm FADD_ZZZ : sve_fp_3op_u_zd<0b000, "fadd", fadd>;		defm FADD_ZZZ : sve_fp_3op_u_zd<0b000, "fadd", fadd>;
defm FSUB_ZZZ : sve_fp_3op_u_zd<0b001, "fsub", fsub>;		defm FSUB_ZZZ : sve_fp_3op_u_zd<0b001, "fsub", fsub>;
defm FMUL_ZZZ : sve_fp_3op_u_zd<0b010, "fmul", fmul>;		defm FMUL_ZZZ : sve_fp_3op_u_zd<0b010, "fmul", fmul>;
defm FTSMUL_ZZZ : sve_fp_3op_u_zd_ftsmul<0b011, "ftsmul", int_aarch64_sve_ftsmul_x>;		defm FTSMUL_ZZZ : sve_fp_3op_u_zd_ftsmul<0b011, "ftsmul", int_aarch64_sve_ftsmul_x>;
defm FRECPS_ZZZ : sve_fp_3op_u_zd<0b110, "frecps", int_aarch64_sve_frecps_x>;		defm FRECPS_ZZZ : sve_fp_3op_u_zd<0b110, "frecps", int_aarch64_sve_frecps_x>;
defm FRSQRTS_ZZZ : sve_fp_3op_u_zd<0b111, "frsqrts", int_aarch64_sve_frsqrts_x>;		defm FRSQRTS_ZZZ : sve_fp_3op_u_zd<0b111, "frsqrts", int_aarch64_sve_frsqrts_x>;

▲ Show 20 Lines • Show All 1,406 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/SVEInstrFormats.td

Show First 20 Lines • Show All 1,270 Lines • ▼ Show 20 Lines	multiclass sve_fp_2op_p_zds<bits<4> opc, string asm,
def _S : sve_fp_2op_p_zds<0b10, opc, asm, ZPR32>;		def _S : sve_fp_2op_p_zds<0b10, opc, asm, ZPR32>;
def _D : sve_fp_2op_p_zds<0b11, opc, asm, ZPR64>;		def _D : sve_fp_2op_p_zds<0b11, opc, asm, ZPR64>;

def : SVE_3_Op_Pat<nxv8f16, op, nxv8i1, nxv8f16, nxv8f16, !cast<Instruction>(NAME # _H)>;		def : SVE_3_Op_Pat<nxv8f16, op, nxv8i1, nxv8f16, nxv8f16, !cast<Instruction>(NAME # _H)>;
def : SVE_3_Op_Pat<nxv4f32, op, nxv4i1, nxv4f32, nxv4f32, !cast<Instruction>(NAME # _S)>;		def : SVE_3_Op_Pat<nxv4f32, op, nxv4i1, nxv4f32, nxv4f32, !cast<Instruction>(NAME # _S)>;
def : SVE_3_Op_Pat<nxv2f64, op, nxv2i1, nxv2f64, nxv2f64, !cast<Instruction>(NAME # _D)>;		def : SVE_3_Op_Pat<nxv2f64, op, nxv2i1, nxv2f64, nxv2f64, !cast<Instruction>(NAME # _D)>;
}		}

		multiclass sve_fp_2op_p_zds_pred<bits<4> opc, string asm,
		SDPatternOperator op, SDPatternOperator intr> {
		defm NAME : sve_fp_2op_p_zds<opc, asm, intr>;

		def : Pat<(vselect (nxv8i1 PPR:$Pg), (nxv8f16 (op ZPR:$Zds, ZPR:$Zn)), ZPR:$Zds),
		(!cast<Instruction>(NAME # _H) PPR:$Pg, ZPR:$Zds, ZPR:$Zn)>;
		efriedmaUnsubmitted Done Reply Inline Actions If we're going to stick this style of pattern all over the place, we probably want a "class" for it. efriedma: If we're going to stick this style of pattern all over the place, we probably want a "class"…
		def : Pat<(vselect (nxv4i1 PPR:$Pg), (nxv4f32 (op ZPR:$Zds, ZPR:$Zn)), ZPR:$Zds),
		(!cast<Instruction>(NAME # _S) PPR:$Pg, ZPR:$Zds, ZPR:$Zn)>;
		def : Pat<(vselect (nxv4i1 PPR:$Pg), (nxv4f32 (op ZPR:$Zds, ZPR:$Zn)), undef),
		(!cast<Instruction>(NAME # _S) PPR:$Pg, ZPR:$Zds, ZPR:$Zn)>;
		def : Pat<(vselect (nxv2i1 PPR:$Pg), (nxv2f64 (op ZPR:$Zds, ZPR:$Zn)), ZPR:$Zds),
		(!cast<Instruction>(NAME # _D) PPR:$Pg, ZPR:$Zds, ZPR:$Zn)>;
		}

multiclass sve_fp_2op_p_zds_fscale<bits<4> opc, string asm,		multiclass sve_fp_2op_p_zds_fscale<bits<4> opc, string asm,
SDPatternOperator op> {		SDPatternOperator op> {
def _H : sve_fp_2op_p_zds<0b01, opc, asm, ZPR16>;		def _H : sve_fp_2op_p_zds<0b01, opc, asm, ZPR16>;
def _S : sve_fp_2op_p_zds<0b10, opc, asm, ZPR32>;		def _S : sve_fp_2op_p_zds<0b10, opc, asm, ZPR32>;
def _D : sve_fp_2op_p_zds<0b11, opc, asm, ZPR64>;		def _D : sve_fp_2op_p_zds<0b11, opc, asm, ZPR64>;

def : SVE_3_Op_Pat<nxv8f16, op, nxv8i1, nxv8f16, nxv8i16, !cast<Instruction>(NAME # _H)>;		def : SVE_3_Op_Pat<nxv8f16, op, nxv8i1, nxv8f16, nxv8i16, !cast<Instruction>(NAME # _H)>;
def : SVE_3_Op_Pat<nxv4f32, op, nxv4i1, nxv4f32, nxv4i32, !cast<Instruction>(NAME # _S)>;		def : SVE_3_Op_Pat<nxv4f32, op, nxv4i1, nxv4f32, nxv4i32, !cast<Instruction>(NAME # _S)>;
▲ Show 20 Lines • Show All 5,140 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-fp-pred.ll

This file was added.

				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s

				define <vscale x 8 x half> @fadd_h(<vscale x 8 x i1> %p,
				<vscale x 8 x half> %a,
				<vscale x 8 x half> %b) {
				; CHECK-LABEL: fadd_h:
				; CHECK: fadd z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				%add = fadd <vscale x 8 x half> %a, %b
				%res = select <vscale x 8 x i1> %p, <vscale x 8 x half> %add, <vscale x 8 x half> %a
				ret <vscale x 8 x half> %res
				}

				define <vscale x 4 x float> @fadd_s(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %a,
				<vscale x 4 x float> %b) {
				; CHECK-LABEL: fadd_s:
				; CHECK: fadd z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				%add = fadd <vscale x 4 x float> %a, %b
				%res = select <vscale x 4 x i1> %p, <vscale x 4 x float> %add, <vscale x 4 x float> %a
				ret <vscale x 4 x float> %res
				}

				define <vscale x 2 x double> @fadd_d(<vscale x 2 x i1> %p,
				<vscale x 2 x double> %a,
				<vscale x 2 x double> %b) {
				; CHECK-LABEL: fadd_d:
				; CHECK: fadd z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				%add = fadd <vscale x 2 x double> %a, %b
				%res = select <vscale x 2 x i1> %p, <vscale x 2 x double> %add, <vscale x 2 x double> %a
				ret <vscale x 2 x double> %res
				}

				define <vscale x 8 x half> @fsub_h(<vscale x 8 x i1> %p,
				<vscale x 8 x half> %a,
				<vscale x 8 x half> %b) {
				; CHECK-LABEL: fsub_h:
				; CHECK: fsub z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				%sub = fsub <vscale x 8 x half> %a, %b
				%res = select <vscale x 8 x i1> %p, <vscale x 8 x half> %sub, <vscale x 8 x half> %a
				ret <vscale x 8 x half> %res
				}

				define <vscale x 4 x float> @fsub_s(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %a,
				<vscale x 4 x float> %b) {
				; CHECK-LABEL: fsub_s:
				; CHECK: fsub z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				%sub = fsub <vscale x 4 x float> %a, %b
				%res = select <vscale x 4 x i1> %p, <vscale x 4 x float> %sub, <vscale x 4 x float> %a
				ret <vscale x 4 x float> %res
				}

				define <vscale x 2 x double> @fsub_d(<vscale x 2 x i1> %p,
				<vscale x 2 x double> %a,
				<vscale x 2 x double> %b) {
				; CHECK-LABEL: fsub_d:
				; CHECK: fsub z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				%sub = fsub <vscale x 2 x double> %a, %b
				%res = select <vscale x 2 x i1> %p, <vscale x 2 x double> %sub, <vscale x 2 x double> %a
				ret <vscale x 2 x double> %res
				}

				define <vscale x 8 x half> @fmul_h(<vscale x 8 x i1> %p,
				<vscale x 8 x half> %a,
				<vscale x 8 x half> %b) {
				; CHECK-LABEL: fmul_h:
				; CHECK: fmul z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				%mul = fmul <vscale x 8 x half> %a, %b
				%res = select <vscale x 8 x i1> %p, <vscale x 8 x half> %mul, <vscale x 8 x half> %a
				ret <vscale x 8 x half> %res
				}

				define <vscale x 4 x float> @fmul_s(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %a,
				<vscale x 4 x float> %b) {
				; CHECK-LABEL: fmul_s:
				; CHECK: fmul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				%mul = fmul <vscale x 4 x float> %a, %b
				%res = select <vscale x 4 x i1> %p, <vscale x 4 x float> %mul, <vscale x 4 x float> %a
				ret <vscale x 4 x float> %res
				}

				define <vscale x 2 x double> @fmul_d(<vscale x 2 x i1> %p,
				<vscale x 2 x double> %a,
				<vscale x 2 x double> %b) {
				; CHECK-LABEL: fmul_d:
				; CHECK: fmul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				%mul = fmul <vscale x 2 x double> %a, %b
				%res = select <vscale x 2 x i1> %p, <vscale x 2 x double> %mul, <vscale x 2 x double> %a
				ret <vscale x 2 x double> %res
				}

				define <vscale x 8 x half> @fdiv_h(<vscale x 8 x i1> %p,
				<vscale x 8 x half> %a,
				<vscale x 8 x half> %b) {
				; CHECK-LABEL: fdiv_h:
				; CHECK: fdiv z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				%div = fdiv <vscale x 8 x half> %a, %b
				%res = select <vscale x 8 x i1> %p, <vscale x 8 x half> %div, <vscale x 8 x half> %a
				ret <vscale x 8 x half> %res
				}

				define <vscale x 4 x float> @fdiv_s(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %a,
				<vscale x 4 x float> %b) {
				; CHECK-LABEL: fdiv_s:
				; CHECK: fdiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				%div = fdiv <vscale x 4 x float> %a, %b
				%res = select <vscale x 4 x i1> %p, <vscale x 4 x float> %div, <vscale x 4 x float> %a
				ret <vscale x 4 x float> %res
				}

				define <vscale x 2 x double> @fdiv_d(<vscale x 2 x i1> %p,
				<vscale x 2 x double> %a,
				<vscale x 2 x double> %b) {
				; CHECK-LABEL: fdiv_d:
				; CHECK: fdiv z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				%div = fdiv <vscale x 2 x double> %a, %b
				%res = select <vscale x 2 x i1> %p, <vscale x 2 x double> %div, <vscale x 2 x double> %a
				ret <vscale x 2 x double> %res
				}