This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SME] Support NEON scalar FP instructions in streaming mode
ClosedPublic

Authored by c-rhodes on Aug 13 2021, 7:58 AM.

Details

Summary

The following scalar FP instructions are legal in streaming mode:

0101 1110 xx1x xxxx 11x1 11xx xxxx xxxx # FMULX/FRECPS/FRSQRTS (scalar)
0101 1110 x10x xxxx 00x1 11xx xxxx xxxx # FMULX/FRECPS/FRSQRTS (scalar, FP16)
01x1 1110 1x10 0001 11x1 10xx xxxx xxxx # FRECPE/FRSQRTE/FRECPX (scalar)
01x1 1110 1111 1001 11x1 10xx xxxx xxxx # FRECPE/FRSQRTE/FRECPX (scalar, FP16)

Predicate them on HasNEONorStreamingSVE. Full list of affected
instructions:

FMULX16, FMULX32, FMULX64, FRECPS16, FRECPS32, FRECPS64, FRSQRTS16,
FRSQRTS32, FRSQRTS64, FRECPEv1f16, FRECPEv1i32, FRECPEv1i64, FRECPXv1f16,
FRECPXv1i32, FRECPXv1i64, FRSQRTEv1f16, FRSQRTEv1i32, FRSQRTEv1i64

Depends on D107902.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06/SIMD-FP-Instructions

Execution of NEON instructions that are illegal in streaming mode will
cause a trap or exception. Using FMULX [1] as an example, this check is
at the top of the pseudocode:

if elements == 1 then
    CheckFPEnabled64();
else
    CheckFPAdvSIMDEnabled64();

For the legal scalar variants it calls CheckFPEnabled64, whereas for the
illegal vector variants it calls CheckFPAdvSIMDEnabled64 which traps.

This is useful for observing which instructions are/aren't legal
in streaming mode.

[1] https://developer.arm.com/documentation/ddi0602/2021-06/SIMD-FP-Instructions/FMULX--Floating-point-Multiply-extended-

Diff Detail

Event Timeline

c-rhodes created this revision.Aug 13 2021, 7:58 AM
c-rhodes requested review of this revision.Aug 13 2021, 7:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 13 2021, 7:58 AM
david-arm added inline comments.Aug 16 2021, 3:05 AM
llvm/lib/Target/AArch64/AArch64InstrFormats.td
6832

Hi @c-rhodes, I think this might look nicer if we just put predicates around the instructions that derive from these classes instead, i.e.

let Predicate = [HasNEON] in {
  defm FABD     : SIMDFPThreeScalar<1, 1, 0b010, "fabd", int_aarch64_sisd_fabd>;
}

let Predicates = [HasNEONorStreamingSVE] in {
  defm FMULX    : SIMDFPThreeScalar<0, 0, 0b011, "fmulx", int_aarch64_neon_fmulx>;
  defm FRECPS   : SIMDFPThreeScalar<0, 0, 0b111, "frecps", int_aarch64_neon_frecps>;
  defm FRSQRTS  : SIMDFPThreeScalar<0, 1, 0b111, "frsqrts", int_aarch64_neon_frsqrts>;
}
7032

Is it possible to do something similar here too, i.e. predicates around the definitions rather than the multiclass? I believe that you can have nested "let Predicates = " commands, right?

c-rhodes added inline comments.Aug 16 2021, 3:37 AM
llvm/lib/Target/AArch64/AArch64InstrFormats.td
6832

Hi @c-rhodes, I think this might look nicer if we just put predicates around the instructions that derive from these classes instead, i.e.

let Predicate = [HasNEON] in {
  defm FABD     : SIMDFPThreeScalar<1, 1, 0b010, "fabd", int_aarch64_sisd_fabd>;
}

let Predicates = [HasNEONorStreamingSVE] in {
  defm FMULX    : SIMDFPThreeScalar<0, 0, 0b011, "fmulx", int_aarch64_neon_fmulx>;
  defm FRECPS   : SIMDFPThreeScalar<0, 0, 0b111, "frecps", int_aarch64_neon_frecps>;
  defm FRSQRTS  : SIMDFPThreeScalar<0, 1, 0b111, "frsqrts", int_aarch64_neon_frsqrts>;
}

that would break the fp16 variant which requires +fullfp16

7032

Is it possible to do something similar here too, i.e. predicates around the definitions rather than the multiclass? I believe that you can have nested "let Predicates = " commands, right?

let statements can be nested but the outer let Predicates = ... will override the inner predicate

c-rhodes updated this revision to Diff 366619.Aug 16 2021, 7:10 AM

Pass predicate as operand to multiclass rather than check opcode bits.

c-rhodes added inline comments.Aug 16 2021, 7:10 AM
llvm/lib/Target/AArch64/AArch64InstrFormats.td
6832

Hi @c-rhodes, I think this might look nicer if we just put predicates around the instructions that derive from these classes instead, i.e.

let Predicate = [HasNEON] in {
  defm FABD     : SIMDFPThreeScalar<1, 1, 0b010, "fabd", int_aarch64_sisd_fabd>;
}

let Predicates = [HasNEONorStreamingSVE] in {
  defm FMULX    : SIMDFPThreeScalar<0, 0, 0b011, "fmulx", int_aarch64_neon_fmulx>;
  defm FRECPS   : SIMDFPThreeScalar<0, 0, 0b111, "frecps", int_aarch64_neon_frecps>;
  defm FRSQRTS  : SIMDFPThreeScalar<0, 1, 0b111, "frsqrts", int_aarch64_neon_frsqrts>;
}

that would break the fp16 variant which requires +fullfp16

I've changed it to pass the predicate as an argument, hopefully that makes it clearer

david-arm accepted this revision.Aug 16 2021, 7:50 AM

LGTM! This looks a lot nicer now thanks @c-rhodes. :)

This revision is now accepted and ready to land.Aug 16 2021, 7:50 AM
Matt added a subscriber: Matt.Aug 18 2021, 1:58 PM
This revision was landed with ongoing or failed builds.Aug 23 2021, 2:24 AM
This revision was automatically updated to reflect the committed changes.