Adds a design document for implementing the SME ABI in LLVM. This document
can be used as a reference for follow-up patches that attempt to implement
the ABI.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
In one diff you disable inlining for SME functions. I am afraid that there are more incompatible transformations on functions, i.e., function merging, hot-cold splitting, outlining, ...
You state that the vscale can change during the execution of the program. I would have expected a LangRef update.
I agree a LangRef update makes sense if we're loosening the constraint that vscale is constant across the whole program. Might need some adjustments to make various transforms work; in particular, I suspect GEP constant expressions involving vscale break.
Are aarch64_pstate_sm_compatible functions allowed to call arbitrary functions? I guess there's some way to save/restore PSTATE.SM?
llvm/docs/AArch64SME.rst | ||
---|---|---|
279 | Not sure the CopyToReg+CopyFromReg thing is actually enough to prevent all the transforms you want to prevent... if I'm following correctly, you really need to prevent *any* operations from showing up in the middle of the call sequence, and the SelectionDAG scheduler isn't really set up to enforce that sort of constraint. Would be more obviously correct to use a pseduo-instruction for the call, and expand it to smstop+call+smstart using a post-isel hook. | |
382 | You could use a feature flag that reflects the state of PSTATE.SM, but doesn't actually enable/disable any instructions. But I guess that's basically the same thing as a function attribute. | |
398 | Disabling vectorization might not be enough to completely prevent the use of vector instructions, but probably close enough. (For example, we use vector instructions to lower popcount.) |
In one diff you disable inlining for SME functions. I am afraid that there are more incompatible transformations on functions, i.e., function merging, hot-cold splitting, outlining, ...
Having different functions using different instruction set extensions isn't that unusual. We support that sort of thing on many different targets, including x86. (I haven't looked at the diffs to see if it's using the usual target hooks for this.)
Would it help to have a idk v2scale or vmscale for the matrix mode and vscale for the non-matrix mode. Then both can be constant over time, but in some states it would invalid to reason about one but not about the other.
Enabling streaming mode means FP/vector code may be executed on a different processing element. This has implications beyond having a different SVE vector length since it also clears register state and changes the set of allowable instructions. For the purpose of LLVM IR it is sufficient to support only a single vscale as long as the input IR is restricted to not pass/return (pointers to) scalable vectors between such functions and any streaming mode changes happen only on function-call boundaries. The SME ACLE has similar restrictions, so Clang emitting these attributes will ensure these restrictions are honoured in the IR.
We'd rather not open the can of worms of supporting multiple vscales because this is not really a capability we need from LLVM. Even if it was a generic capability in LLVM then there would still be other issues we'd need to tackle in a similar way as outlined in this document, so I'm not sure if it's worth making a change to the LangRef.
llvm/docs/AArch64SME.rst | ||
---|---|---|
279 |
It seems that this is exactly what Glue is meant to do. It tells the SelectionDAG scheduler to put these operations in the same SUnit node, which means they get scheduled together without anything else being scheduled in between. Most of the call-chain is glued together in this way.
I experimented with that, but found that replacing the ISD::CALL with a pseudo node is not really feasible. There are some complications to this approach:
| |
398 | Yes you're right, we'll probably find cases where we need to limit the code-generator to lower things differently when in streaming mode. |
Rephrased section 2 (Handling PSTATE.SM) and replaced some uses of vscale with SVE vector length.
For the purpose of LLVM IR it is sufficient to support only a single vscale as long as the input IR is restricted to not pass/return (pointers to) scalable vectors between such functions and any streaming mode changes happen only on function-call boundaries.
Sure, that makes sense: supporting multiple different scales within a function would be complicated.
But you still need a LangRef change. Interprocedural optimizations need to know that "vscale" computed in the callee might be different from "vscale" computed in the caller. As a practical matter, it probably affects very few optimizations, but certain optimizations might need to be aware of it. (At the moment, this is most likely to be relevant for GEP constant expressions, since we don't really do interprocedural hoisting or sinking. But in theory it could also be relevant for calls to llvm.vscale.)
Thanks, I think that's fair enough, we can tone down the wording in the LangRef a bit.
Not sure the CopyToReg+CopyFromReg thing is actually enough to prevent all the transforms you want to prevent... if I'm following correctly, you really need to prevent *any* operations from showing up in the middle of the call sequence, and the SelectionDAG scheduler isn't really set up to enforce that sort of constraint. Would be more obviously correct to use a pseduo-instruction for the call, and expand it to smstop+call+smstart using a post-isel hook.