This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Add bfloat16 support to perm and select intrinsics
ClosedPublic

Authored by c-rhodes on Jun 19 2020, 6:50 AM.

Details

Summary

Added for following intrinsics:

  • zip1, zip2, zip1q, zip2q
  • trn1, trn2, trn1q, trn2q
  • uzp1, uzp2, uzp1q, uzp2q
  • splice
  • rev
  • sel

Diff Detail

Event Timeline

c-rhodes created this revision.Jun 19 2020, 6:50 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 19 2020, 6:50 AM
fpetrogalli added inline comments.Jun 19 2020, 8:15 AM
clang/include/clang/Basic/arm_sve.td
1191

nit: could create a multiclass here like @sdesmalen have done in https://reviews.llvm.org/D82187, seems quite a nice way to keep the definition of the intrinsics together (look for multiclass StructLoad, for example)

1374

Same here, could use a multiclass to merge the "regular" intrinsics definition with the BF ones.

This revision is now accepted and ready to land.Jun 19 2020, 9:26 AM
c-rhodes added inline comments.Jun 19 2020, 9:56 AM
clang/include/clang/Basic/arm_sve.td
1191

it might be a bit tedious having separate multiclasses, what do you think about:

multiclass SInstBF16<string n, string p, string t, MergeType mt, string i = "",
                     list<FlagType> ft = [], list<ImmCheck> ch = []> {
  def : SInst<n, p, t, mt, i, ft, ch>;
  let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in {
    def : SInst<n, p, "b", mt, i, ft, ch>;
  }
}

defm SVREV    : SInstBF16<"svrev[_{d}]",    "dd",   "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_rev">;
defm SVSEL    : SInstBF16<"svsel[_{d}]",    "dPdd", "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_sel">;
defm SVSPLICE : SInstBF16<"svsplice[_{d}]", "dPdd", "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_splice">;
defm SVTRN1   : SInstBF16<"svtrn1[_{d}]",   "ddd",  "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_trn1">;
defm SVTRN2   : SInstBF16<"svtrn2[_{d}]",   "ddd",  "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_trn2">;
defm SVUZP1   : SInstBF16<"svuzp1[_{d}]",   "ddd",  "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_uzp1">;
defm SVUZP2   : SInstBF16<"svuzp2[_{d}]",   "ddd",  "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_uzp2">;
defm SVZIP1   : SInstBF16<"svzip1[_{d}]",   "ddd",  "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_zip1">;
defm SVZIP2   : SInstBF16<"svzip2[_{d}]",   "ddd",  "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_zip2">;

?

c-rhodes added inline comments.Jun 23 2020, 3:13 AM
clang/include/clang/Basic/arm_sve.td
1191

I've played around with this and it works great for instructions guarded on a single feature flag but falls apart for the .Q forms that also require __ARM_FEATURE_SVE_MATMUL_FP64. I suspect there's a nice way of handling it in tablegen by passing the features as a list of strings and joining them but I spent long enough trying to get that to work so I'm going to keep it simple for now.

sdesmalen added inline comments.Jun 23 2020, 3:44 AM
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_sel.c
2 ↗(On Diff #272050)

Can you move the clang bfloat tests to separate files and add a RUN line similar to what we've done for the sve2 tests (to check that we get a diagnostic if +bf16 is not specified) ?

c-rhodes added inline comments.Jun 23 2020, 5:32 AM
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_sel.c
2 ↗(On Diff #272050)

I can move the bfloat tests to separate files but I'm not sure about the RUN line, if +bf16 is omitted we get the following:

/home/culrho01/llvm-project/build/bin/clang -cc1 -internal-isystem /home/culrho01/llvm-project/build/lib/clang/11.0.0/include -nostdsysteminc -D__ARM_FEATURE_SVE -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC -D__ARM_FEATURE_SVE_BF16 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -fsyntax-only -verify /home/culrho01/llvm-project/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_rev-bfloat.c
error: no expected directives found: consider use of 'expected-no-diagnostics'
error: 'error' diagnostics seen but not expected:
  File /home/culrho01/llvm-project/build/lib/clang/11.0.0/include/arm_bf16.h Line 14: __bf16 is not supported on this target
  File /home/culrho01/llvm-project/build/lib/clang/11.0.0/include/arm_sve.h Line 52: __bf16 is not supported on this target
3 errors generated.

Whereas I think the desired behaviour we want to test as we do for sve2 is checking the intrinsics are guarded with the right feature flag, which at the moment would be omitting -D__ARM_FEATURE_SVE_BF16 from the RUN line, until +bf16 implies -D__ARM_FEATURE_SVE_BF16 anyway, which is when the ACLE is fully complete. Should I do that? I guess we'd want to update this RUN line to omit +bf16 when it implies the feature macro

c-rhodes updated this revision to Diff 272745.Jun 23 2020, 9:06 AM

Changes:

  • Moved bfloat tests to separate files.
  • Added checks to test intrinsics are guarded by feature flag, this is by omitting the feature macro __ARM_FEATURE_SVE_BF16 for now but will eventually be updated to omit +bf16 once the feature flag implies the macro.
c-rhodes marked an inline comment as done.Jun 23 2020, 9:06 AM
fpetrogalli added inline comments.Jun 23 2020, 11:19 AM
clang/include/clang/Basic/arm_sve.td
1191

it might be a bit tedious having separate multiclasses, what do you think about:

Sorry I think I misunderstood you when we last discussed this. I didn't mean to write a multiclass that would work for ALL intrinsics that uses regular types and bfloats.... I just meant to merge together those who were using the same archguard and that you are adding in this patch.

I think you could keep both macros in a single ArchGuard string:

multiclass SInstPerm<string n, string p, MergeType mt, string i> {
  def : SInst<n, p, "csilUcUsUiUlhfd",  mt, i>;
  let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in {
    def : SInst<n, p, "b", mt, i>;
  }
}

defm SVREV    : SInstPerm<"svrev[_{d}]",    "dd",    MergeNone, "aarch64_sve_rev">;
...

multiclass SInstPermMatmul<string n, string p, MergeType mt, string i> {
  def : SInst<n, p, "csilUcUsUiUlhfd",  mt, i>;
  let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16) && defined(__ARM_FEATURE_SVE_MATMUL_FP64)" in {
    def : SInst<n, p, "b", mt, i>;
  }
}

def SVTRN1Q : SInstPermMatmul ...
...
c-rhodes added inline comments.Jun 24 2020, 2:52 AM
clang/include/clang/Basic/arm_sve.td
1191

Sure, I understood you meant separate multiclasses for each intrinsic / group similar to what Sander implemented for structured loads / stores but I thought it would be quite abit of extra code to implement that, hence why I proposed a single multiclass that could handle this. I've experimented with the SInstBF16 multiclass I mentioned above and have it working with an extra arg for arch features. I'll create a follow up patch and if people are happy with it we'll move forward with that, otherwise I'm happy to implement your suggestion.

This revision was automatically updated to reflect the committed changes.
sdesmalen added inline comments.Jun 24 2020, 4:02 AM
llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll
809

Does this test not need the +bf16 attribute to work? (which implies the patterns are missing the right predicate)

c-rhodes marked an inline comment as done.Jun 24 2020, 4:06 AM
c-rhodes added inline comments.
clang/include/clang/Basic/arm_sve.td
1191

I'll create a follow up patch

https://reviews.llvm.org/D82450

c-rhodes added inline comments.Jun 24 2020, 5:07 AM
llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll
809

It should do, sorry I missed that. I've tried:

diff --git a/llvm/lib/Target/AArch64/SVEInstrFormats.td b/llvm/lib/Target/AArch64/SVEInstrFormats.td
index 46cca2a..5ab2502 100644
--- a/llvm/lib/Target/AArch64/SVEInstrFormats.td
+++ b/llvm/lib/Target/AArch64/SVEInstrFormats.td
@@ -1124,10 +1124,13 @@ multiclass sve_int_perm_reverse_z<string asm, SDPatternOperator op> {
   def : SVE_1_Op_Pat<nxv4i32, op, nxv4i32, !cast<Instruction>(NAME # _S)>;
   def : SVE_1_Op_Pat<nxv2i64, op, nxv2i64, !cast<Instruction>(NAME # _D)>;

-  def : SVE_1_Op_Pat<nxv8bf16, op, nxv8bf16, !cast<Instruction>(NAME # _H)>;
   def : SVE_1_Op_Pat<nxv8f16,  op, nxv8f16,  !cast<Instruction>(NAME # _H)>;
   def : SVE_1_Op_Pat<nxv4f32,  op, nxv4f32,  !cast<Instruction>(NAME # _S)>;
   def : SVE_1_Op_Pat<nxv2f64,  op, nxv2f64,  !cast<Instruction>(NAME # _D)>;
+
+  let Predicates = [HasBF16] in {
+    def : SVE_1_Op_Pat<nxv8bf16, op, nxv8bf16, !cast<Instruction>(NAME # _H)>;
+  }
 }

but this still works without +bf16. I noticed in your patch D82187 you check Subtarget->hasBF16() for MVT::nxv8bf16 at select phase of ISEL, I guess it's different here with patterns. I also noticed we add the register class for MVT::nxv8bf16 in AArch64ISelLowering without checking Subtarget->hasBF16() which I suspect is a bug. This test requires +bf16 with that fixed but I wonder why the predicate isn't being recognised.