The "Matrix" part of the Scalable Matrix Extension is a new register
"ZA". You can think of this as a square matrix made up of scalable rows,
where each row is one scalable vector long. However it is not made
of the existing scalable vector registers, it is its own register.
Meaning that the size of ZA is the vector length in bytes * the vector
length in bytes.
https://developer.arm.com/documentation/ddi0616/latest/
It uses the streaming vector length, even when streaming mode itself
is not active. For this reason, it's register data heeader always includes
the streaming vector length and I've added a pseudo register "SVG"
for users to be able to see this. Since they may be interacting
with ZA outside of streaming mode, where "VG" would be the non-streaming
vector length.
Due to it's size I've changed kMaxRegisterByteSize to the maximum
possible ZA size and kTypicalRegisterByteSize will be the maximum
possible scalable vector size. Therefore ZA transactions will cause heap
allocations, and non ZA registers will perform exactly as before.
ZA can be enabled and disabled independently of streaming mode. The way
this works in ptrace is different to SVE versus streaming SVE. Writing
NT_ARM_ZA without register data disables ZA, writing NT_ARM_ZA with
register data enables ZA (LLDB will only support the latter, and only
because it's convenient for us to do so).
https://kernel.org/doc/html/v6.2/arm64/sme.html
LLDB does not handle registers that can appear and dissappear at
runtime. Rather than add complexity to implement that, LLDB will
show a block of 0s when ZA is disabled.
The alternative is not only updating the vector lengths every stop,
but every register definition. It's possible but I'm not sure it's worth
pursuing.
Users should refer to the SVCR register (added in the next patch)
for the final word on whether ZA is active or not.
Writing to "VG" during streaming mode will change the size of the
streaming sve registers and ZA. LLDB will not attempt to preserve
register values in this case, we'll just read back the undefined
content the kernel shows. This is in line with, as stated, the
kernel ABIs and the prospective software ABIs look like.
ZA is defined as a vector of size SVL*SVL, so the display in lldb
is very basic. A giant block of values. This is no worse than SVE,
just larger. There is scope to improve this but that can wait
until we see some use cases.
Testing:
- TestSVEThreadedDynamic now checks for correct SVG values.
- New test TestZAThreadedDynamic creates 3 threads with different ZA sizes and states and switches between them verifying the register value (derived from the existing threaded SVE test).
- New test TestZARegisterSaveRestore starts in a given SME state, runs a set of expressions in various orders, then checks that the original state has been restored.
- TestArm64DynamicRegsets has ZA and SVG checks added, including writing to ZA to enable it.
An SME enabled program has the following extra state:
- Streaming mode or non-streaming mode.
- ZA enabled or disabled.
- The active vector length.
Covering the transition between all possible states and all other
possible states is not viable, therefore the testing is a cross
section of that, all of which found real bugs in LLDB and the Linux
Kernel during development.
Many of those transitions will not be possible via LLDB
(e.g. disabling ZA) and many more are possible but unlikely to be
used in normal use.
Running these tests will as usual require QEMU as there is no
real SME hardware available at this time, and a very recent
kernel.
clang-format not found in user’s local PATH; not linting file.