This is an archive of the discontinued LLVM Phabricator instance.

[Clang][AArch64][SME] Add vector load/store (ld1/st1) intrinsics
ClosedPublic

Authored by bryanpkc on Jun 15 2022, 3:44 PM.

Details

Summary

[Clang][AArch64][SME] Add vector load/store (ld1/st1) intrinsics

This patch adds support for the following SME ACLE intrinsics (as defined
in https://arm-software.github.io/acle/main/acle.html):

  • svld1_hor_za8 // also for _za16, _za32, _za64 and _za128
  • svld1_hor_vnum_za8 // also for _za16, _za32, _za64 and _za128
  • svld1_ver_za8 // also for _za16, _za32, _za64 and _za128
  • svld1_ver_vnum_za8 // also for _za16, _za32, _za64 and _za128
  • svst1_hor_za8 // also for _za16, _za32, _za64 and _za128
  • svst1_hor_vnum_za8 // also for _za16, _za32, _za64 and _za128
  • svst1_ver_za8 // also for _za16, _za32, _za64 and _za128
  • svst1_ver_vnum_za8 // also for _za16, _za32, _za64 and _za128

SveEmitter.cpp is extended to generate arm_sme.h (currently named
arm_sme_draft_spec_subject_to_change.h) and other SME definitions from
arm_sme.td, which is modeled after arm_sve.td. Common TableGen definitions
are moved into arm_sve_sme_incl.td.

Co-authored-by: Sagar Kulkarni <sagar.kulkarni1@huawei.com>

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2022, 3:44 PM
sagarkulkarni19 requested review of this revision.Jun 15 2022, 3:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2022, 3:44 PM

Hi @sagarkulkarni19, this patch will have to wait until the ABI is implemented so that the builtins can be decorated with the attributes. A first patch proposing these attributes is in D127762. LLVM patches that use these attributes to implement the ABI will follow soon. We could add you as reviewer to these patches if you'd like to help us with that?

clang/include/clang/Basic/arm_sve.td
16

Is there value in having both IsSME and IsSMELoadStore?

clang/lib/Basic/Targets/AArch64.cpp
425

When this macro is non-zero, it suggests that the compiler implements the full SME ACLE. That is currently not yet the case, so until then we should leave this macro undefined.

Hi @sagarkulkarni19, this patch will have to wait until the ABI is implemented so that the builtins can be decorated with the attributes. A first patch proposing these attributes is in D127762. LLVM patches that use these attributes to implement the ABI will follow soon. We could add you as reviewer to these patches if you'd like to help us with that?

Hi @sdesmalen, sounds good. I can update this patch accordingly when the attributes patches are in. Sure, please add me as a reviewer to those patches. We are porting much of our downstream C intrinsics implementation to support the ACLE instead and can contribute this effort upstream. I have also created an article on discourse (https://discourse.llvm.org/t/clang-sme-acle-intrinsics/63223) for further discussion on this.

clang/include/clang/Basic/arm_sve.td
16

IsSME flag is checked in the SveEmitter.cpp : createSMEHeader(...) to generate arm_sme.h with only the SME intrinsics, whereas IsSMELoadStore is used to correctly CodeGen (CGBuiltins.cpp) load and store intrinsics.

clang/lib/Basic/Targets/AArch64.cpp
425

Okay makes sense. But until all SME ACLE is implemented, do we need some sort of guard in the meantime while the SME intrinsics that are implemented incrementally?

Matt added a subscriber: Matt.Jun 16 2022, 4:43 PM
sdesmalen added inline comments.Jun 17 2022, 4:29 AM
clang/lib/Basic/Targets/AArch64.cpp
425

The tests need to have manual -D__ARM_FEATURE_SME in the RUN lines. Once the feature is implemented and we add the definition automatically when +sme is specified, we update the tests. See for example D81725 where we did this for SVE.

sagarkulkarni19 edited the summary of this revision. (Show Details)

Updated testcases and also added the vnum variant of the ld1 and st1 intrinsics.

sagarkulkarni19 marked 2 inline comments as done.Jun 24 2022, 4:40 PM
sagarkulkarni19 added inline comments.
clang/lib/Basic/Targets/AArch64.cpp
425

Thanks. That makes sense, now the macro is undefined and I have updated the testcases accordingly.

sagarkulkarni19 marked an inline comment as done.Jun 24 2022, 5:19 PM
  • Support Opaque pointers
  • Correct predicate types for the intrinsics.
  • Decorate intrinsics with SME attributes.

Update testcases by adding "arm_streaming" attribute to the callee.

Hi @sagarkulkarni19, thank you for working on the ACLE builtins for SME! I've had a look through and I have a few comments, mostly around how the code is structured. It would be good if you could try to separate SVE from SME in this implementation, in the same way that NEON and SVE are distinct. It's possible to do this whilst reusing much of the code in SveEmitter.cpp.

clang/include/clang/Basic/arm_sve.td
16

You don't need the IsSME flag either because in CodeGenFunction::EmitAArch64BuiltinExpr you can do exactly the same thing as SVE, i.e. something like

if (BuiltinID >= AArch64::FirstSMEBuiltin &&
    BuiltinID <= AArch64::LastSMEBuiltin)
  return EmitAArch64SMEBuiltinExpr(BuiltinID, E);
17

We don't need these new flags 'IsSMELd1' and 'IsSMESt1'. Please can you reuse the existing IsLoad and IsStore flags?

301

SME is really a distinct architecture and so I think it should live in it's own arm_sme.td file in the same way that we have arm_neon.td and arm_sve.td. It's possible to do this and still reuse the SveEmitter.cpp code. If you look at SveEmitter.cpp you'll see these functions:

void EmitSveHeader(RecordKeeper &Records, raw_ostream &OS) {
  SVEEmitter(Records).createHeader(OS);
}

void EmitSveBuiltins(RecordKeeper &Records, raw_ostream &OS) {
  SVEEmitter(Records).createBuiltins(OS);
}

void EmitSveBuiltinCG(RecordKeeper &Records, raw_ostream &OS) {
  SVEEmitter(Records).createCodeGenMap(OS);
}

void EmitSveRangeChecks(RecordKeeper &Records, raw_ostream &OS) {
  SVEEmitter(Records).createRangeChecks(OS);
}

It would be good to add similar ones for SME, i.e. EmitSmeRangeChecks, etc.

clang/lib/CodeGen/CGBuiltin.cpp
9504

I would prefer this to be handled inside it's own EmitAArch64SMEBuiltinExpr, since it's a separate architecture.

clang/utils/TableGen/SveEmitter.cpp
869

I would prefer this to be done more precisely via separate attribute flags, i.e. in the .td file decorate each ACLE builtin with something like IsShared, IsStreaming, IsStreamingCompatible, etc. otherwise you'll end up needing loads of flags for all different possible combinations. That way you can do:

if (this->Flags & Emitter.getEnumValueForFlag("IsStreaming"))
  this->SMEAttributes += "arm_streaming";

etc.

1045

If you move the builtins to their own file in arm_sme.td and emit the records using interfaces like EmitSmeBuiltins, etc. then you already know they are SME builtins and so don't need the flag.

Hi @sagarkulkarni19, just a gentle ping to see if you are still planning to do more work on this patch?

bryanpkc commandeered this revision.Nov 24 2022, 3:33 PM
bryanpkc added a reviewer: sagarkulkarni19.

Hi @sagarkulkarni19, just a gentle ping to see if you are still planning to do more work on this patch?

@david-arm we are going to update this patch very soon. Sorry for the delays.

bryanpkc updated this revision to Diff 481132.Dec 7 2022, 6:12 PM
bryanpkc marked 5 inline comments as done.
bryanpkc edited the summary of this revision. (Show Details)

Updated the patch according to review comments which suggest to treat SME as a different architecture.

bryanpkc marked 2 inline comments as done.Dec 7 2022, 6:18 PM

@david-arm, I have moved all the SME definitions into a new file, arm_sme.td. I have moved the common definitions into another file, arm_sve_sme_incl.td, which will be included by both arm_sve.td and arm_sme.td. SveEmitter has been updated to incorporate your suggestions.

bryanpkc updated this revision to Diff 481137.Dec 7 2022, 6:52 PM

Removed some diffs that weren't necessary.

bryanpkc updated this revision to Diff 481205.Dec 8 2022, 2:05 AM

Removed some more unnecessary lines.

bryanpkc updated this revision to Diff 492585.Jan 26 2023, 3:13 PM
bryanpkc retitled this revision from [Clang][AArch64] Add SME C intrinsics for load and store to [Clang][AArch64][SME] Add vector load/store (ld1/st1) intrinsics.
bryanpkc edited the summary of this revision. (Show Details)

Updated the patch to define the __ARM_FEATURE_SME macro when the feature is enabled.

bryanpkc updated this revision to Diff 492586.Jan 26 2023, 3:19 PM

Minor clean-up. Sorry for the noise.

Hi @bryanpkc, this is looking a lot better now and thanks for addressing the comments! I've not reviewed all of the patch yet, but I do have a few more comments. The most important ones are about performing immediate range checks for the builtins and not declaring the __ARM_FEATURE_SME yet.

clang/include/clang/Basic/TargetBuiltins.h
315

I actually don't think you need to add this class - we should be able to just reuse the existing SVETypeFlags structure. I think this is fine because you've commonised the flags between SME and SVE.

clang/include/clang/Basic/arm_sme.td
22

I think all the load and store instructions need immediate checks for the tile and slice_offset here such as:

[ImmCheck<0, ImmCheck0>, ImmCheck<2, ImmCheck0_15>]

for SVLD1_HOR_ZA8 and the others. It's mentioned in the ACLE - https://arm-software.github.io/acle/main/acle.html#sme-language-extensions-and-intrinsics:

15.4.3.1 Common Rules

...
Every argument named tile, slice_offset or tile_mask must be an integer constant expression in the range of the underlying instruction.
clang/include/clang/Basic/arm_sve_sme_incl.td
127

Please can you add a comment here for the new Prototype modifier you added - '%'?

clang/lib/Basic/Targets/AArch64.cpp
503

Can you remove this please? We can't really set this macro until the SME ABI and ACLE is feature complete.

clang/lib/CodeGen/CGBuiltin.cpp
9347

I think that instead of this switch statement you should just be able to write something like:

Ops[3] = EmitSVEPredicateCast(Ops[3],  getSVEVectorForElementType(SVEBuiltinMemEltTy(TypeFlags)))
9389

I think you can just call

CGM.getIntrinsic(Intrinsic::aarch64_sme_cntsb)
9391

Again, I think you can just do

Builder.CreateCall(StreamingVectorLength)
9400

nit: Function *F = CGM.getIntrinsic(IntID);

clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_int_const_expr_error.c
5 ↗(On Diff #492586)

Once you've added the immediate range checks for the loads and stores it would be good add checks here for immediates outside the range for each instruction.

clang/utils/TableGen/SveEmitter.cpp
1589

If you reuse the existing SVETypeFlags rather than create a new SMETypeFlags then you only need the LLVM_GET_SME_IMMCHECKTYPES bit.

dmgreen added inline comments.
clang/utils/TableGen/SveEmitter.cpp
1432

We have been changing how the existing SVE and NEON instrinsics work a little. There are some details in https://reviews.llvm.org/D131064. The short version is it is best to not rely upon preprocessor macros, and instead define the intrinsics so that they can be used if the right target features are available. This allows us to do things like this below, even without a -march that supports sme, and have them callable at runtime under the right situations. We should be doing the same for SME.

__attribute__((target("+sme")))
void sme_func() {
  somesmeintrinsics();
}
bryanpkc marked 9 inline comments as done.Feb 7 2023, 10:18 AM
bryanpkc added inline comments.
clang/include/clang/Basic/TargetBuiltins.h
315

Thanks for the suggestion. I have removed this class.

clang/include/clang/Basic/arm_sme.td
22

Done.

clang/include/clang/Basic/arm_sve_sme_incl.td
127

A comment for '%' already exists on line 103.

clang/lib/Basic/Targets/AArch64.cpp
503

OK. Could you educate me what else is needed for SME ABI and ACLE to be feature-complete? How can I help?

clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_int_const_expr_error.c
5 ↗(On Diff #492586)

Done, thanks for the suggestion. I have also moved these tests into clang/test/Sema/arm-sme-intrinsics/.

clang/utils/TableGen/SveEmitter.cpp
1432

I have refactored the SME intrinsics in a similar fashion. Could you confirm if I did it correctly?

bryanpkc updated this revision to Diff 495592.Feb 7 2023, 10:24 AM
bryanpkc marked 4 inline comments as done.

Rebased on trunk and addressed review comments.

Thanks a lot for making all the changes @bryanpkc - it's looking really good now! I just have a few minor comments/suggestions and then I think it looks good to go.

clang/include/clang/Basic/arm_sme.td
23

This is just a suggestion, but you could reduce the lines of code here if you want by creating a multiclass that creates both the horizontal and vertical variants for each size, i.e. something like

multiclass MyMultiClass<..> {
  def NAME # _H : MInst<...>
  def NAME # _V : MInst<...>
}

defm SVLD1_ZA8 : MyMultiClass<...>;

or whatever naming scheme you prefer, and same for the stores. Feel free to ignore this suggestion though if it doesn't help you!

clang/lib/Basic/Targets/AArch64.cpp
503

It should have complete support for the SME ABI and ACLE in terms of supporting the C/C++ level attributes as described here - https://arm-software.github.io/acle/main/acle.html#sme-language-extensions-and-intrinsics. For example, the compiler should support cases where a normal function calls a arm_streaming function and sets up the state correctly, etc. You can see @sdesmalen's patch to add the clang-side attributes here D127762. There should also be full support for all of the SME ACLE builtins.

clang/test/Sema/aarch64-sme-intrinsics/acle_sme_imm.cpp
4

I think you can remove the -target-feature +sve flags from the RUN lines because +sme should imply that.

17

These tests are great! Thanks for doing this. Could you also add the _vnum variants too?

sdesmalen added inline comments.Feb 9 2023, 4:04 AM
clang/lib/Headers/CMakeLists.txt
332

The ACLE specification is still in a draft (ALP) state, which means there may still be subject to significant changes. To avoid users from using this implementation with the expectation that their code is compliant going forward, it would be good to rename the header file to something that makes it very clear this feature is not yet ready to use. I'm thinking of something like arm_sme_draft_spec_subject_to_change.h. When the specification goes out of draft, we can rename it to arm_sme.h. Could you rename the file for now?

bryanpkc added inline comments.Feb 12 2023, 7:00 PM
clang/lib/Headers/CMakeLists.txt
332

Would something shorter like arm_sme_draft.h or arm_sme_experimental.h suffice?

bryanpkc updated this revision to Diff 496964.Feb 13 2023, 6:49 AM
bryanpkc edited the summary of this revision. (Show Details)

Addressed review comments.

bryanpkc marked 5 inline comments as done.Feb 13 2023, 6:51 AM
bryanpkc added inline comments.
clang/include/clang/Basic/arm_sme.td
23

Thanks for the suggestion. Refactored.

clang/lib/Headers/CMakeLists.txt
332

Renamed to arm_sme_experimental.h.

bryanpkc updated this revision to Diff 496976.Feb 13 2023, 7:29 AM
bryanpkc marked an inline comment as done.

Fixed minor bugs in the previous upload.

dmgreen added inline comments.Feb 13 2023, 8:11 AM
clang/utils/TableGen/SveEmitter.cpp
1432

It looks OK - as far as I can see. It might be worth adding a test for it? But otherwise it looks good. Thanks.

sdesmalen added inline comments.Feb 14 2023, 1:02 AM
clang/lib/Headers/CMakeLists.txt
332

While arm_sme_experimental.h is indeed shorter, it should be unequivocally clear to the user that they shouldn't rely on the function prototypes defined in this header file yet because the specification itself is not finalised. I think that adding the draft_spec_subject_to_change to the name makes that more clear than experimental, as the latter might suggest that the support is not yet entirely stable or complete. There probably isn't that much to gain from making the user experience better by using a shorter name, if the whole point is to prevent people from using it for any production code. So from that perspective, I still have a slight preference for arm_sme_draft_spec_subject_to_change.h.

bryanpkc updated this revision to Diff 498700.Feb 19 2023, 4:18 PM
bryanpkc marked 2 inline comments as done and an inline comment as not done.

Addressed review comments.

bryanpkc marked an inline comment as done.Feb 19 2023, 4:26 PM
bryanpkc added inline comments.
clang/lib/Headers/CMakeLists.txt
332

Renamed as suggested.

clang/test/Sema/aarch64-sme-intrinsics/acle_sme_imm.cpp
4

If I understand llvm/lib/Target/AArch64/AArch64.td and clang/lib/Basic/Targets/AArch64.cpp correctly, -target-feature +sme does not imply -target-feature +sve. I can remove the -D__ARM_FEATURE_SME though.

On the other hand, -march=armv8+sme will imply -target-feature +sve, which I have implemented in D142702.

clang/utils/TableGen/SveEmitter.cpp
1432

Added a test case.

kmclaughlin added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
8874

Is it necessary to add an EltTypeBool128? I think the EmitSVEPredicateCast call in EmitSMELd1St1 is creating a vector based on the memory element type and not the predicate type?

bryanpkc marked an inline comment as done.Feb 21 2023, 3:54 PM
bryanpkc added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
8874

You are right, EltTypeBool128 is not used right now; I can remove it. When we started working on the pre-ACLE intrinsics in our downstream compiler, we were planning to add something like svptrue_b128, even though the hardware PTRUE instruction doesn't support the Q size specifier. It is still not clear to me how the ACLE wants users to create a predicate vector for a call to a _za128 intrinsic.

kmclaughlin accepted this revision.Feb 23 2023, 9:31 AM

Thank you for checking and removing EltTypeBool128. I think you have addressed all of the other comments on this patch too, so it looks good to me!

Please can you update the commit message before landing, to change the reference to arm_sme_experimental.h with arm_sme_draft_spec_subject_to_change.h?

clang/lib/CodeGen/CGBuiltin.cpp
8874

I think the reason why svptrue_b128 wasn't added is because there is no ptrue.q instruction as you say, and it is possible to use any of the other forms (.b, .h, etc) with the 128 bit instructions since they only require every 16th lane.
It would be possible for the user to create a 128 bit predicate with either svunpk or svzip of svbool_t with pfalse where necessary. Otherwise using the other forms, e.g svptrue_b64(), will achieve the same result and require fewer instructions.

This revision is now accepted and ready to land.Feb 23 2023, 9:31 AM
bryanpkc updated this revision to Diff 500578.Feb 26 2023, 7:09 AM

Removed EltTypeBool128 as suggested.

bryanpkc marked 3 inline comments as done.Feb 26 2023, 7:11 AM
bryanpkc edited the summary of this revision. (Show Details)
bryanpkc updated this revision to Diff 500630.Feb 26 2023, 2:58 PM

Moved test case acle_target_sme.c into clang/test/Sema/aarch64-sme-intrinsics/.

bryanpkc updated this revision to Diff 500638.Feb 26 2023, 5:36 PM
bryanpkc edited the summary of this revision. (Show Details)

Fix some bugs in the range checks of immediate operands.

sdesmalen added inline comments.May 10 2023, 8:01 AM
clang/lib/Basic/Targets/AArch64.cpp
726

Why did you remove this?

clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_ld1.c
16

Hi @bryanpkc, would you be happy to remove the dependence on the attributes patch for now, so that we can move forward to review/land this patch series?

I'm thinking of doing something like:

// RUN: %clang_cc1 -DDISABLE_SME_ATTRIBUTES ....
...

#ifdef DISABLE_SME_ATTRIBUTES
#define ARM_STREAMING_ATTR
#else
#define ARM_STREAMING_ATTR __attribute__((arm_streaming))
#endif

...

ARM_STREAMING_ATTR void test_svld1_hor_za16(uint32_t slice_base, svbool_t pg, const void *ptr) {
  svld1_hor_za16(0, slice_base, 0, pg, ptr);
  svld1_hor_za16(1, slice_base, 7, pg, ptr);
}

With that the tests all pass, and when the attribute patches have landed we can just remove the -DDISABLE_SME_ATTRIBUTES from the RUN lines.

bryanpkc added inline comments.May 11 2023, 7:20 AM
clang/lib/Basic/Targets/AArch64.cpp
726

The Feature == "+sme" case is also handled below on line 782. Perhaps we should delete this block and add HasFullFP16 below.

clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_ld1.c
16

Sounds good, I will give that a shot. If we can unblock this patch, I will also rebase and update the rest of the patch series. Thanks!

bryanpkc updated this revision to Diff 522039.May 14 2023, 9:55 PM
bryanpkc edited the summary of this revision. (Show Details)
bryanpkc updated this revision to Diff 522064.May 15 2023, 12:04 AM

@sdesmalen How does the latest version look to you? If you approve, I will land this and finish updating the rest of the patch series. One of the pre-merge builds failed for an unrelated reason, but other jobs were successful.

sdesmalen accepted this revision.May 19 2023, 8:36 AM

Thanks @bryanpkc this patch looks good to me now. I'll make some time to review the other patches in the series as well after you update them.

This revision was landed with ongoing or failed builds.May 28 2023, 6:01 PM
This revision was automatically updated to reflect the committed changes.