This is an archive of the discontinued LLVM Phabricator instance.

[lldb][AArch64] Add SME streaming vector length pseudo register
AbandonedPublic

Authored by DavidSpickett on Jul 14 2023, 1:12 AM.

Details

Reviewers
omjavaid
Summary

Reading the SVE registers of streaming mode from non-streaming mode,
and vice versa, returns invalid data. As their state is reset each
time you switch between them.

However, the vector length is distinct between the two modes.

The existing register "vg" will always report the vector length for
the current mode and this change adds "svg" which will always return
the streaming vector length.

non-streaming mode: vg  = non-streaming vector length
                    svg = streaming vector length
    streaming mode: vg  = streaming vector length
                    svg = streaming vector length

The content of svg is read from the NT_ARM_SSVE header, even if
we are in non-streaming mode. Which we are allowed to do
(the result is just missing the register data in this situation).

It is read only for the moment. It may be made writeable in future
patches. It has been added to the SME register set and I've converted
that into a struct for easier handling.

The SVE dynamic size test has been updated to check the expected
svg values as it is already setup to break just after a mode switch.

Diff Detail

Event Timeline

DavidSpickett created this revision.Jul 14 2023, 1:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2023, 1:12 AM
DavidSpickett requested review of this revision.Jul 14 2023, 1:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2023, 1:12 AM
DavidSpickett edited the summary of this revision. (Show Details)Jul 14 2023, 1:13 AM
DavidSpickett added a reviewer: omjavaid.

Just curious -- would it be better to have a single vg register shown to the user which is the vector length in non-streaming mode when the processor is in non-streaming mode, and the vector length in streaming mode when the processor is in streaming mode? I haven't worked with a target supporting SSVE mode, but if I'm following correctly we can show the SVE registers in non-streaming mode (and vg) and we can show the SSVE registers in streaming mode (and svg). As you say above, we can show svg when in non-streaming mode but we can't show vg when in streaming mode. Should we only show a single vg for the currently-available registers?

Matt added a subscriber: Matt.Jul 14 2023, 2:42 PM
DavidSpickett planned changes to this revision.Jul 17 2023, 1:01 AM

As you say above, we can show svg when in non-streaming mode but we can't show vg when in streaming mode. Should we only show a single vg for the currently-available registers?

I will admit I don't have practical experience to justify adding svg, I'm just guessing it would be nice to know what vector length you're about to switch into. That argument plays just as well for streaming -> non-streaming switches though.

The problem with that is we don't have an obvious name (or place) to show the non-streaming vg. For svg, we're adding the new register set to hold svcr anyway (which is essential because it's the only way to know the active mode). The architecture doesn't assign a new name to the non-streaming mode, so you could call it "svevg" and "ssvevg" but that's no help to the average person who hasn't (and shouldn't have to) read the manual.

Given that this isn't an essential feature, I'll put this on the backburner. At least until after 17, once I have more experience writing SME code.

As you say above, we can show svg when in non-streaming mode but we can't show vg when in streaming mode. Should we only show a single vg for the currently-available registers?

I will admit I don't have practical experience to justify adding svg, I'm just guessing it would be nice to know what vector length you're about to switch into. That argument plays just as well for streaming -> non-streaming switches though.

Oh yeah, don't take my question too seriously, my suggestion is formed entirely by looking at your patches and never having seen SVE or SME code myself.

Could you offer higher abstractions? Show me the current SVME vector length? Show me the current SVME mode?

Could you offer higher abstractions? Show me the current SVME vector length? Show me the current SVME mode?

Adding it to process status is along those lines, we have stuff like the number of addressable bits there right now. Overall I prefer the registers route just for visibility but we'll see what the early users find.

I would never question giving low-level access to the registers. As you mentioned less experienced users could accidentally switch between the modes with knowing.

lldb/test/API/commands/register/register/aarch64_sve_registers/rw_access_dynamic_resize/TestSVEThreadedDynamic.py
262

Newline please

I would never question giving low-level access to the registers.

Well in your defense, both svg and svcr will actually be pseudo registers. So the user isn't getting access to the "real" ones either way, we're emulating the behaviour with ptrace commands.

As you mentioned less experienced users could accidentally switch between the modes with knowing.

If we follow the kernel to the letter you can also mode switch by writing floating point registers while in streaming mode. That currently doesn't happen due to the way we model v registers as subsets of z but I might have to change that and if I do, that's another potential pitfall.

Ideally we would have as few routes to mode switch via the debugger as possible. Writing to the streaming vector control register is the single route I'd support given the choice.

Ideally we would have as few routes to mode switch via the debugger as possible. Writing to the streaming vector control register is the single route I'd support given the choice.

I wonder about function calls when in streaming mode, where someone might not even realize they're in it. Does lldb-server support QSaveRegisterState / QRestoreRegisterState around inferior function calls, or does lldb use the g/G packet (or write all the registers individually)? g/G are probably going to fetch / write all the floating point registers and reset the mode if you did a function call while in streaming mode?

I think in https://reviews.llvm.org/D154926, lldb/test/API/commands/register/register/aarch64_sve_registers/rw_access_static_config/TestSVERegisters.py addresses this. If what you mean is you are stopped in streaming mode, you evaluate an expression that may call a function which takes you into another mode.

If not, give me an example and I'll try to test it. This is the first I'm hearing of QSaveRegisterState / QRestoreRegisterState.

g/G are probably going to fetch / write all the floating point registers and reset the mode if you did a function call while in streaming mode?

We'd have to order them carefully I expect. Or say something like if we're restoring floating point and SVE registers, just ignore the floating point because we're about to supersede it.

I am not 100% sure that one cannot implement streaming SVE as a completely separate register state, I will be checking that today. If you can then it will complicate things in theory.

I think in https://reviews.llvm.org/D154926, lldb/test/API/commands/register/register/aarch64_sve_registers/rw_access_static_config/TestSVERegisters.py addresses this. If what you mean is you are stopped in streaming mode, you evaluate an expression that may call a function which takes you into another mode.

Oh it sounds like you've already been thinking about this, no worries. I was thinking about the mere act of restoring the register state. When lldb does a function call, it saves all of the register contents, does the function call, then restores the registers. This was traditionally done with the g/G (read/write the entire register context), if that is supported by the stub. Alternatively registers can all be read/written individually. As a simplification of all of this, and to avoid using g/G, we added QSaveRegisterState which tells the stub (debugserver etc) to save the current register context, and then after the inferior function call has completed, QRestoreRegisterState to restore them all.

In the process of restoring / writing the registers, I expect we will try to write the floating point register contents into the process which would drop it out of SSVE mode?

As a simplification of all of this, and to avoid using g/G, we added QSaveRegisterState which tells the stub (debugserver etc) to save the current register context, and then after the inferior function call has completed, QRestoreRegisterState to restore them all.

Looks like we do support that in lldb-server, I just hadn't come across it because I've been down at the native process level. I've updated ReadAllRegisterValues down there so it will restore to whatever the saved mode was.

In the process of restoring / writing the registers, I expect we will try to write the floating point register contents into the process which would drop it out of SSVE mode?

Right, yes it would if we follow this statement from the kernel docs:

Note that when SME is present and streaming SVE mode is in use the FPSIMD subset of registers will be read via NT_ARM_SVE and NT_ARM_SVE writes will exit streaming mode in the target.

I am talking to our kernel folks to understand the background to that. I suspect that it may be the case that for example, writing to the bottom 128 bits of streaming mode z0 may not be reflected in the SIMD unit's v0. Or at least, one could build a core that acted that way.

I suspect that it may be the case that for example, writing to the bottom 128 bits of streaming mode z0 may not be reflected in the SIMD unit's v0. Or at least, one could build a core that acted that way.

But the user would be very confused by this given that if you are stopped here in streaming mode:

mov v0.d[0] x0

That instruction would actually see the bottom 128 bits of streaming z0, even if elsewhere there is another, inactive v0 register in the core. If the user then does register write v0 {....} I doubt they would expect it to mode switch and write to a whole different v0, it should update z0.

So even if on a hardware level this configuration is possible, I don't think it's good to have the debugger act this way. Better that we think of this from the perspective of a running instruction, what would it see and therefore what would the user expect to happen.

I am talking to our kernel folks to understand the background to that.

The result is that yes cores an implement it as separate state but as mentioned here, taking that into account in lldb would be rather confusing in 99% of situations. If we simply want to read what instructions in the current context will see, using the bottom 128 bits of the Z registers is fine.

DavidSpickett abandoned this revision.Jul 27 2023, 1:57 AM

Turns out that for ZA support, we need to know the streaming vector length regardless of current mode. So SVG will be implemented as part of ZA support instead.