This is an archive of the discontinued LLVM Phabricator instance.

[lldb][AArch64] Add SME's Array Storage (ZA) and streaming vector length (SVG) registers

Authored by DavidSpickett on Aug 14 2023, 8:06 AM.



The "Matrix" part of the Scalable Matrix Extension is a new register
"ZA". You can think of this as a square matrix made up of scalable rows,
where each row is one scalable vector long. However it is not made
of the existing scalable vector registers, it is its own register.
Meaning that the size of ZA is the vector length in bytes * the vector
length in bytes.

It uses the streaming vector length, even when streaming mode itself
is not active. For this reason, it's register data heeader always includes
the streaming vector length and I've added a pseudo register "SVG"
for users to be able to see this. Since they may be interacting
with ZA outside of streaming mode, where "VG" would be the non-streaming
vector length.

Due to it's size I've changed kMaxRegisterByteSize to the maximum
possible ZA size and kTypicalRegisterByteSize will be the maximum
possible scalable vector size. Therefore ZA transactions will cause heap
allocations, and non ZA registers will perform exactly as before.

ZA can be enabled and disabled independently of streaming mode. The way
this works in ptrace is different to SVE versus streaming SVE. Writing
NT_ARM_ZA without register data disables ZA, writing NT_ARM_ZA with
register data enables ZA (LLDB will only support the latter, and only
because it's convenient for us to do so).

LLDB does not handle registers that can appear and dissappear at
runtime. Rather than add complexity to implement that, LLDB will
show a block of 0s when ZA is disabled.

The alternative is not only updating the vector lengths every stop,
but every register definition. It's possible but I'm not sure it's worth

Users should refer to the SVCR register (added in the next patch)
for the final word on whether ZA is active or not.

Writing to "VG" during streaming mode will change the size of the
streaming sve registers and ZA. LLDB will not attempt to preserve
register values in this case, we'll just read back the undefined
content the kernel shows. This is in line with, as stated, the
kernel ABIs and the prospective software ABIs look like.

ZA is defined as a vector of size SVL*SVL, so the display in lldb
is very basic. A giant block of values. This is no worse than SVE,
just larger. There is scope to improve this but that can wait
until we see some use cases.


  • TestSVEThreadedDynamic now checks for correct SVG values.
  • New test TestZAThreadedDynamic creates 3 threads with different ZA sizes and states and switches between them verifying the register value (derived from the existing threaded SVE test).
  • New test TestZARegisterSaveRestore starts in a given SME state, runs a set of expressions in various orders, then checks that the original state has been restored.
  • TestArm64DynamicRegsets has ZA and SVG checks added, including writing to ZA to enable it.

An SME enabled program has the following extra state:

  • Streaming mode or non-streaming mode.
  • ZA enabled or disabled.
  • The active vector length.

Covering the transition between all possible states and all other
possible states is not viable, therefore the testing is a cross
section of that, all of which found real bugs in LLDB and the Linux
Kernel during development.

Many of those transitions will not be possible via LLDB
(e.g. disabling ZA) and many more are possible but unlikely to be
used in normal use.

Running these tests will as usual require QEMU as there is no
real SME hardware available at this time, and a very recent

Diff Detail

Event Timeline

DavidSpickett created this revision.Aug 14 2023, 8:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 14 2023, 8:06 AM
DavidSpickett requested review of this revision.Aug 14 2023, 8:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 14 2023, 8:06 AM

I realise this is a giant patch, so if you see a way to split it, great. However, SVG is read from the ZA header and we need SVG to implement the resizing of ZA. So at least those are tied to each other.

Matt added a subscriber: Matt.Aug 14 2023, 2:40 PM

In, replace the while < 3 threads check in Python with
atomic variables in the program file. This fixes a situation where the first "process continue" hits the breakpoint in thread 2, and waits forever because thread 3 is never

The atomics mean that while one thread may break before the other, the other
is at least known to have started before this can happen.

(I'll apply the same fix to the SVE threaded test, which does occasionally
timeout on the buildbot, presumably due to this)

Replace missing process variable in TestZAThreadedDynamic.

Remove redundant get process calls in

As for the SVE test, the sync variables don't need to be atomic. Each one has one writer and one reader only.

Correct SVE reconfigure on non-SVE machines. It used to return if there was
no vg, which is correct, not sure why I changed that.

Rebase after changes to WriteAllRegisterValues.

omjavaid requested changes to this revision.Sep 12 2023, 1:43 AM

The patch looks to implement all stuff needed at once lets break it down into three patches instead:

  1. First patch that just adds SME register set and does not implement RegisterContext Read/Write routines.
  2. Implement register context read/write for SME registers
  3. Implement size re-configuration after changing size of the SME vector.
This revision now requires changes to proceed.Sep 12 2023, 1:43 AM
DavidSpickett abandoned this revision.Sep 12 2023, 4:51 AM

I've split them. I didn't split ZA definition and read/write because the former would be almost nothing, but otherwise it's into a few commits now ending with testing.

I won't fold the ZA bugfix into this as I think it's weird enough to warrant its own commit and therefore blame entry (and it doesn't show up in testing anyway).