This is an archive of the discontinued LLVM Phabricator instance.

[lldb][AArch64] Add SME's Array Storage (ZA) register
ClosedPublic

Authored by DavidSpickett on Sep 12 2023, 4:48 AM.

Details

Summary

Note: This requires later commits for ZA to function properly,
it is split for ease of review. Testing is also in a later patch.

The "Matrix" part of the Scalable Matrix Extension is a new register
"ZA". You can think of this as a square matrix made up of scalable rows,
where each row is one scalable vector long. However it is not made
of the existing scalable vector registers, it is its own register.
Meaning that the size of ZA is the vector length in bytes * the vector
length in bytes.

https://developer.arm.com/documentation/ddi0616/latest/

It uses the streaming vector length, even when streaming mode itself
is not active. For this reason, it's register data header always
includes the streaming vector length.

Due to it's size I've changed kMaxRegisterByteSize to the maximum
possible ZA size and kTypicalRegisterByteSize will be the maximum
possible scalable vector size. Therefore ZA transactions will cause heap
allocations, and non ZA registers will perform exactly as before.

ZA can be enabled and disabled independently of streaming mode. The way
this works in ptrace is different to SVE versus streaming SVE. Writing
NT_ARM_ZA without register data disables ZA, writing NT_ARM_ZA with
register data enables ZA (LLDB will only support the latter, and only
because it's convenient for us to do so).

https://kernel.org/doc/html/v6.2/arm64/sme.html

LLDB does not handle registers that can appear and dissappear at
runtime. Rather than add complexity to implement that, LLDB will
show a block of 0s when ZA is disabled.

The alternative is not only updating the vector lengths every stop,
but every register definition. It's possible but I'm not sure it's worth
pursuing.

Users should refer to the SVCR register (added in later patches)
for the final word on whether ZA is active or not.

Writing to "VG" during streaming mode will change the size of the
streaming sve registers and ZA. LLDB will not attempt to preserve
register values in this case, we'll just read back the undefined
content the kernel shows. This is in line with, as stated, the
kernel ABIs and the prospective software ABIs look like.

ZA is defined as a vector of size SVL*SVL, so the display in lldb
is very basic. A giant block of values. This is no worse than SVE,
just larger. There is scope to improve this but that can wait
until we see some use cases.

Diff Detail

Event Timeline

DavidSpickett created this revision.Sep 12 2023, 4:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 12 2023, 4:48 AM
DavidSpickett requested review of this revision.Sep 12 2023, 4:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 12 2023, 4:48 AM

This looks good overall thanks for doing the patch split makes the review way less overwhelming.

lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.cpp
342

In case of ZA inactive can we avoid having to transfer these zeros over gdb protocol and construct this register on the user side without even doing the transfer?

DavidSpickett added inline comments.Sep 13 2023, 1:57 AM
lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.cpp
342

You could but you'd still need a way to signal to the user side that ZA is in fact inactive. Most obvious way is to send a single 0 and know that if you get that but the streaming vector length is provided too, we must extend the value.

However, that bumps up against a lot of checks in lldb's client side. I know there's at least one that fails if the data received is < the assumed size of the register (for more we just truncate it seems). Bypassing these for 1 target specific register is effort and more importantly, potentially hiding issues. The register size check was actually very useful developing ZA support, so I wouldn't like to subvert it.

GDB also doesn't do anything like this, so we would be compatible with a gdbserver that sent us all the data but an lldb-server wouldn't be compatible with a gdb because it only sends a bit of the data.

You're right that there's scope to compress the data here but it seems better done at a protocol level or by adding some kind of new way to describe registers overall.

Put ZA in the SME register set instead.

Once other patches are applied you end up with:

Scalable Matrix Extension Registers:
      svcr = 0x0000000000000000
       svg = 0x0000000000000004
        za = {0x00 <...>}
omjavaid added inline comments.Sep 14 2023, 12:47 AM
lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.cpp
566

ZA -> SME?

lldb/source/Plugins/Process/Utility/RegisterInfoPOSIX_arm64.h
112

ConfigureVectorLength -> ConfigureVectorLengthSVE?

omjavaid added inline comments.Sep 14 2023, 12:54 AM
lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.cpp
597

can you explain why in case of ZA inactive shouldnt we fill the buffer with zeros here as well ?

DavidSpickett added inline comments.Sep 14 2023, 1:03 AM
lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.cpp
597

Start state: ZA is disabled, reading ptrace gives you just the header.

What we must do to restore that state is to write the header back with no register data. So that's why we don't insert 0s here or trust the currently cached za buffer.

Unlike SVE, there's no flag to say make this active. The presence of the register data is that flag.

...which I will put in a comment as well.

Address review comments.

DavidSpickett marked 3 inline comments as done.Sep 14 2023, 2:15 AM
DavidSpickett added inline comments.
lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.cpp
597

Comment is a bit below this.

omjavaid accepted this revision.Sep 15 2023, 3:03 AM

This is LGTM. Thanks for your patience.

This revision is now accepted and ready to land.Sep 15 2023, 3:03 AM