Note: This requires later commits for ZA to function properly,
it is split for ease of review. Testing is also in a later patch.
The "Matrix" part of the Scalable Matrix Extension is a new register
"ZA". You can think of this as a square matrix made up of scalable rows,
where each row is one scalable vector long. However it is not made
of the existing scalable vector registers, it is its own register.
Meaning that the size of ZA is the vector length in bytes * the vector
length in bytes.
https://developer.arm.com/documentation/ddi0616/latest/
It uses the streaming vector length, even when streaming mode itself
is not active. For this reason, it's register data header always
includes the streaming vector length.
Due to it's size I've changed kMaxRegisterByteSize to the maximum
possible ZA size and kTypicalRegisterByteSize will be the maximum
possible scalable vector size. Therefore ZA transactions will cause heap
allocations, and non ZA registers will perform exactly as before.
ZA can be enabled and disabled independently of streaming mode. The way
this works in ptrace is different to SVE versus streaming SVE. Writing
NT_ARM_ZA without register data disables ZA, writing NT_ARM_ZA with
register data enables ZA (LLDB will only support the latter, and only
because it's convenient for us to do so).
https://kernel.org/doc/html/v6.2/arm64/sme.html
LLDB does not handle registers that can appear and dissappear at
runtime. Rather than add complexity to implement that, LLDB will
show a block of 0s when ZA is disabled.
The alternative is not only updating the vector lengths every stop,
but every register definition. It's possible but I'm not sure it's worth
pursuing.
Users should refer to the SVCR register (added in later patches)
for the final word on whether ZA is active or not.
Writing to "VG" during streaming mode will change the size of the
streaming sve registers and ZA. LLDB will not attempt to preserve
register values in this case, we'll just read back the undefined
content the kernel shows. This is in line with, as stated, the
kernel ABIs and the prospective software ABIs look like.
ZA is defined as a vector of size SVL*SVL, so the display in lldb
is very basic. A giant block of values. This is no worse than SVE,
just larger. There is scope to improve this but that can wait
until we see some use cases.
In case of ZA inactive can we avoid having to transfer these zeros over gdb protocol and construct this register on the user side without even doing the transfer?