Page MenuHomePhabricator

[AArch64][SME] Split up SME features. (alternative approach)
AbandonedPublic

Authored by sdesmalen on Mar 8 2022, 4:40 AM.

Details

Summary

This patch models SME features by adding features for the different PSTATE.SM
and PSTATE.ZA states, and implementing these as independent features to SME,
SVE(2) and NEON. This approach comes from the observation that setting
PSTATE.SM=1 invalidates instructions that are only valid when PSTATE.SM=0.
The same holds for setting PSTATE.SM=0, which invalidates SME instructions.
It can therefore be considered a subtractive feature, rather than an additive
feature.

This patch adds the following:

  • +pstate-sm0 is set by default for all subtargets. NEON/SVE/SVE2 instructions that are only valid when PSTATE.SM=0 are predicated with HasPSTATESM0. In contrast to runtime, for the compiler it is allowed to have both +pstate-sm0 and +pstate-sm1 set, as this makes both instructions available to the compiler. It is up to the compiler (or compiler-user in case of inline-asm) to guarantee that PSTATE.SM is sufficiently guarded/honoured.
  • +pstate-sm1 is enabled by default and enables all instructions valid under PSTATE.SM=1.
  • +pstate-za1 is enabled by default and enables all instructions valid under PSTATE.ZA=1.
  • +sme, +sme-i64 and +sme-f64 are the (only) user-visible flags to enable SME support.

The set of streaming-compatible NEON/SVE/SVE2 instructions are neither
predicated on HasPSTATESM0 nor HasPSTATESM1, and can therefore be
expressed with e.g. -mattr=+sve2,-pstate-sm0[,-pstate-sm1].

The set of streaming-agnostic SME instructions are not predicated on
HasPSTATESM1 and can be expressed with -mattr=+sme,-pstate-sm1.

This is an alternative approach as proposed in D120261 that follows from
an offline conversation with @rsandifo-arm.

Diff Detail

Event Timeline

sdesmalen created this revision.Mar 8 2022, 4:40 AM
sdesmalen requested review of this revision.Mar 8 2022, 4:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2022, 4:40 AM
sdesmalen edited the summary of this revision. (Show Details)

Simplified the patch by setting +pstate-sm1 and +pstate-za1 by default,
as neither of these have any effect without +sme. This reduces the need
to have separate internal/external flags for SME like in the first
revision because having +sme imply +pstate-sm1 and +pstate-za1 led
to problems when disabling the pstate features (which is kind of their
intended use).

This approach looks good to me FWIW. The main advantages as I see it are:

  • Treating PSTATE.SM and PSTATE.ZA as orthogonal to the set of implemented features follows the scheme used in the ISA specification, so it's easy to cross-check the two. The IsSVEEnabled review comment is an example of this.
  • It is likely to cope better with any future extensions to SVE or SME: the streaming-compatible, streaming and non-streaming subsets of each ISA extension would not need to be identified individually on the command line or in IR. Or to put it another way: requiring multiple +streaming-foo flags (and so repeating the “streaming”) feels like specifying the same bit of information multiple times.
  • It avoids the risk of combinatorial explosion if we ever need to do something similar for another degree of freedom.

I think my main open questions are:

  • Is this backwards-compatible? Can new tools consume old LLVM IR text and binaries and still work correctly?
  • Inline asms should be able to use any instruction, if they set up the environment correctly first. How do we cope with that without the compiler generating inappropriate instructions? However, like Sander said off-list, that question applies to both approaches, so it probably doesn't help choose between them.
llvm/lib/Target/AArch64/AArch64.td
457

I think from the user's perspective these flags are now normal unqualified "Enable ..” features, i.e. simple “is this feature present?” flags. The modality is applied on top.

466

I guess “both” means PSTATESM1 and PSTATEZA1. Might be worth saying that explicitly, since there are three features in play.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
147

Should this be: SVE | (SME & PSTATESM1) (converted to tabelgen)? That would follow the semantics of IsSVEEnabled in https://developer.arm.com/documentation/ddi0616/aa/?lang=en . Same for the other two StreamingCompatible features,

Matt added a subscriber: Matt.Mar 17 2022, 5:49 PM
sdesmalen marked 2 inline comments as done.Mar 18 2022, 2:52 AM

Hi @rsandifo-arm, thanks for your comments!

Is this backwards-compatible? Can new tools consume old LLVM IR text and binaries and still work correctly?

This approach is backwards-compatible, because the +pstate-* features are added 'automatically' when LLVM calls the constructor for the AArch64 Subtarget, which is the state that LLVM queries for any target-features/capabilities. It's worth pointing that once we add these flags, any bitcode created by LLVM will always have these features set in the IR, so it won't be easy to change/remove these flags afterwards.

Inline asms should be able to use any instruction, if they set up the environment correctly first. How do we cope with that without the compiler generating inappropriate instructions? However, like Sander said off-list, that question applies to both approaches, so it probably doesn't help choose between them.

While I believe this is a separate concern and should be addressed in a separate patch, I think we can implement this with an extra LLVM-internal feature flag. We could add some -force-assemble-pstate-sm1 feature that doesn't impact the hasSME()- or hasPSTATESM1()-questions the compiler may ask for example when generating code and is therefore only used by the assembler/disassembler. For example for a streaming-compatible function which gets the attributes -pstate-sm0,-pstate-sm1,-pstate-za1, we could let Clang add +force-assemble-pstate-sm0,+force-assemble-pstate-sm1,+force-assemble-pstate-za1 to force any inline-asm code to be assembled regardless of PSTATE.SM/ZA, even though for any other purposes LLVM will assume PSTATE.SM/ZA are not available.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
147

Yes it should, although there seems to be a problem with TableGen in trying to express this.
The assembler behaves as expected, with the exception of:

$ echo "ptrue p0.b, pow2" | ./bin/llvm-mc -mattr=+sme,-pstate-sm0,-pstate-sm1 -
      .text
      ptrue   p0.b, pow2

Where I had expected: <stdin>:1:1: error: instruction requires: sve or sme

I don't think this is a problem though, since pstate-sm0 and pstate-sm1 are not user-facing features, and also because the AssemblerPredicate relates only to the assembler.
Other code in LLVM will rely on the hasStreamingCompatible<Feature>() functions which return the correct result, so there's no chance of the compiler assuming it can use any instructions that it shouldn't. That's why I'm not too worried about having this as a FIXME for now.

Also as you mentioned, we want inline-asm to handle all instructions regardless of PSTATE.SM.

sdesmalen updated this revision to Diff 416444.Mar 18 2022, 3:30 AM
sdesmalen edited the summary of this revision. (Show Details)

Updated some of the comments and changed assembler behaviour to let +sme
enable all streaming-compatible SVE(2) instructions without necessarily
requiring +sve2 as well.

Hi @t.p.northover, I've added you as a reviewer since you're the AArch64 backend maintainer and the approach set out in this patch has some implications on (forward-)compatibility of the IR.

This patch tries to model the feature flags for predicating instructions for Arm's Scalable Matrix Extension (SME).

In short: this patch adds three new subtarget features and adds them (by default) to all AArch64 subtargets:

  • pstate-sm0 (enables instructions that are valid when SME streaming mode is enabled or SME is not available)
  • pstate-sm1 (enables instructions that are valid when SME streaming mode is enabled)
  • pstate-za1 (enables instructions that are valid when SME ZA accumulator array is enabled)

By disabling the above features we can limit LLVM's use of certain instructions. For example, when SME Streaming Mode is enabled, LLVM can't use most NEON instructions or certain SVE(2) instructions.
We express that with +sme,-pstate-sm0,+pstate-sm1 which:

  • Enables all streaming-compatible NEON/SVE/SVE2 instructions implied by +sme (which work regardless of whether PSTATE.SM=0 or PSTATE.SM=1)
  • Enables all SME instructions that are valid when PSTATE.SM=1.
  • Disables all instructions that rely on PSTATE.SM=0 (this is most of NEON and certain SVE(2) instructions)

By default, the pstate-sm0/sm1/za1 features are always added to the list of subtarget-features when constructing the AArch64Subtarget. This way, when the user passes +sme it will enable all SME instructions (regardless of pstate.sm/za).
This has the consequence that when compiling to bitcode, LLVM will always add +pstate-sm0/sm1/za1 as AArch64 features to the bitcode, making these features more or less a permanent requirement going forward irrespective of whether SME is targeted.

I'll admit that my SME knowledge is sketchy but I cannot help but think you're causing yourself problems by tying parts of the code generator together that don't need to be tied. As I see it there are two distinct problems that need to be solved:

  1. SME enables a set of instructions that partially overlap with NEON and SVE.
  2. SME has runtime modes that further restricts what instructions are available.

Both of these can be represented using feature flags but that doesn't mean they require symmetrical coverage.

The feature flags required for (1) are required to protect instruction definitions and usage (i.e. assembler support, isel patterns...). These features are user visible as used by -march= and similar options.

The feature flags [1] used for (2) exist to allow pre isel code generation to make informed choices as to what builtins/transformations are safe to use. When used correctly there should never be an instance where an isel pattern is used in error. If a code path really wants to emit instructions that sit outside of feature (2) (inline asm for example) it seems simpler to assume they know what they're doing and let them. Sure a bug might exist where this logic is wrong but treating (1) and (2) the same just means a runtime SIGILL is replaced with a compiler crash and I don't think the extra complexity is worth it. Especially when we're further down the road and fighting ten more years of new feature options.

I personally think the source of many of these problems will be much higher in the stack. For example, when it comes to the NEON and SVE builtins I figure we'll want to detect invalid calls within clang based on the c/c++ function decorations we'll need anyway. That way the user will see a nice diagnostic rather than a compiler crash or runtime failure.

[1] I use feature flags here as that's what you've implemented, but again here there's a choice. We can use whatever LLVM function decoration is most appropriate. I'm not saying a feature flag is wrong, in fact it seems clean enough, I'm just highlighting you've got options that should be considered as part of the wider SME support within LLVM.

sdesmalen abandoned this revision.Wed, Jun 1, 5:21 AM

Abandoning this patch in favour of modelling this at a higher level as opposed to using feature flags. See also D125977.