This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Show register fields using bitfield struct types
ClosedPublic

Authored by DavidSpickett on Mar 8 2023, 5:02 AM.

Details

Summary

This change uses the information from target.xml sent by
the GDB stub to produce C types that we can use to print
register fields.

lldb-server *does not* produce this information yet. This will
only work with GDB stubs that do. gdbserver or qemu
are 2 I know of. Testing is added that uses a mocked lldb-server.

(lldb) register read cpsr x0 fpcr fpsr x1
    cpsr = 0x60001000
         = (N = 0, Z = 1, C = 1, V = 0, TCO = 0, DIT = 0, UAO = 0, PAN = 0, SS = 0, IL = 0, SSBS = 1, BTYPE = 0, D = 0, A = 0, I = 0, F = 0, nRW = 0, EL = 0, SP = 0)

Only "register read" will display fields, and only when
we are not printing a register block.

For example, cpsr is a 32 bit register. Using the target's scratch type
system we construct a type:

struct __attribute__((__packed__)) cpsr {
  uint32_t N : 1;
  uint32_t Z : 1;
  ...
  uint32_t EL : 2;
  uint32_t SP : 1;
};

If this register had unallocated bits in it, those would
have been filled in by RegisterFlags as anonymous fields.
A new option "SetChildPrintingDecider" is added so we
can disable printing those.

Important things about this type:

  • It is packed so that sizeof(struct cpsr) == sizeof(the real register). (this will hold for all flags types we create)
  • Each field has the same storage type, which is the same as the type of the raw register value. This prevents fields being spilt over into more storage units, as is allowed by most ABIs.
  • Each bitfield size matches that of its register field.
  • The most significant field is first.

The last point is required because the most significant bit (MSB)
being on the left/top of a print out matches what you'd expect to
see in an architecture manual. In addition, having lldb print a
different field order on big/little endian hosts is not acceptable.

As a consequence, if the target is little endian we have to
reverse the order of the fields in the value. The value of each field
remains the same. For example 0b01 doesn't become 0b10, it just shifts
up or down.

This is needed because clang's type system assumes that for a struct
like the one above, the least significant bit (LSB) will be first
for a little endian target. We need the MSB to be first.

Finally, if lldb's host is a different endian to the target we have
to byte swap the host endian value to match the endian of the target's
typesystem.

Host EndianTarget EndianField Order SwapByte Order Swap
LittleLittleYesNo
BigLittleYesYes
LittleBigNoYes
BigBigNoNo

Testing was done as follows:

  • Little -> Little
    • LE AArch64 native debug.
  • Big -> Little
    • s390x lldb running under QEMU, connected to LE AArch64 target.
  • Little -> Big
    • LE AArch64 lldb connected to QEMU's GDB stub, which is running an s390x program.
  • Big -> Big
    • s390x lldb running under QEMU, connected to another QEMU's GDB stub, which is running an s390x program.

As we are not allowed to link core code to plugins directly,
I have added a new plugin RegisterTypeBuilder. There is one implementation
of this, RegisterTypeBuilderClang, which uses TypeSystemClang to build
the CompilerType from the register fields.

Diff Detail

Unit TestsFailed

Event Timeline

DavidSpickett created this revision.Mar 8 2023, 5:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2023, 5:02 AM
DavidSpickett requested review of this revision.Mar 8 2023, 5:02 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 8 2023, 5:02 AM

This all looks good to me. the phab says there's a missing newline at the end of TestXMLRegisterFlags.py.

Given that we can construct fake debug sessions with a gdb_remote_client test, do you think it would be worth having a test that's says it's a big-endian target, and confirming that we byte swap when the test is on a little-endian host? I see you're using a .yaml shell binary for the session, but I wrote a simple client test a while ago, TestStopPCs.py, where I don't even bother with that.

On the other hand, I noticed you marked these as skipIfLLVMTargetMissing("AArch64") - is this because of the clang types? We're not disassembling or jitting anything for the target, as long as lldb recognizes the triple/ArchSpec, for registers it seems like it should be fine? If this skip is really needed, then I don't know how many people would exercise an s390x test even if it was added - I don't personally build with that target normally.

jasonmolenda accepted this revision.Mar 9 2023, 3:28 PM

I'll mark it as accepted, I don't know if Jim or Pavel have any comments. I think this is good.

This revision is now accepted and ready to land.Mar 9 2023, 3:28 PM
  • Rebase
  • Add a test to show the register fields respect the child limit setting.
  • Make sure alignment is correct both when printed as one line and multiple.
  • Correct indentation in test file so it's all 4 space tabs.
DavidSpickett edited the summary of this revision. (Show Details)Mar 10 2023, 5:32 AM

The last update changes the format slightly, though for the better I think.

(lldb) register read cpsr x0 fpcr fpsr x1
    cpsr = 0x60001000
         = (N = 0, Z = 1, C = 1, V = 0, TCO = 0, DIT = 0, UAO = 0, PAN = 0, SS = 0, IL = 0, SSBS = 1, BTYPE = 0, D = 0, A = 0, I = 0, F = 0, nRW = 0, EL = 0, SP = 0)

As we don't print the root type name we get this orphaned =, so I've aligned that with the usual value line above. I think it's clear that it's a second view on the register not a new name (and removing the = is fiddly).

If the fields end up on multiple lines, we handle that too:

(lldb) register read fpcr
    fpcr = 0x00000000
         = {
             AHP = 0
             DN = 0
<...>
           }

This all looks good to me. the phab says there's a missing newline at the end of TestXMLRegisterFlags.py.

Fixed.

Remove need for aarch64 yaml file in tests. Refactor the responders.

I'm trying to get an s390x test going in the same fashion but figuring it out is tricky.

Also, I realise that all this XML substitution with strings is very brittle. I want to replace that with Python's xml.etree but will do that later in another patch.

Add an s390x test for big endian byte ordering. Turns out you can fake AArch64
without the backend, but not s390x.

Luckily, s390x is a default backend. So the LE host -> BE target part will be
tested widely. BE to LE, the AArch64 little endian byte order test will cover,
assuming there is an s390x builder out there somewhere.

DavidSpickett edited the summary of this revision. (Show Details)Mar 13 2023, 3:58 AM

If this skip is really needed, then I don't know how many people would exercise an s390x test even if it was added - I don't personally build with that target normally.

This is now resolved. No skip needed for AArch64, but for whatever reason, it is needed for s390x.

This gdb_remote_client test looks real nice with this update.

Also, I realise that all this XML substitution with strings is very brittle. I want to replace that with Python's xml.etree but will do that later in another patch.

Having tried this, it gives you nice indentation but it's not that much different to the raw text. Except that you can't see exactly what's going in unless you turn on logging. It is useful to have example of the format here at a glance and I don't see there being too much churn in this file so I'm leaving it as is.

The shakiest aspect here is probably my use of the scratch type system. It works but I'm not 100% that there isn't a better choice.

These are the current steps:

  • Make the type name. I've prepended __lldb_ as I saw elsewhere. So register cpsr would have a type __llb_register_fields_cpsr.
  • If we already have a type, use it.
  • If we don't, make one and use that.

So I'm assuming that the scratch type system exists for the life of the debug session. It can be reset and we would regenerate the type
as needed.

Another assumption is that we won't get a new batch of register info mid session. We would just keep using the existing type if we had
already made one. Seems fairly safe to me.

The type name might change in future to be more general e.g. __lldb_register_type_cpsr as and when we support the union and struct
elements that can also be in the XML. It's an internal detail, so this is fine.

JDevlieghere added inline comments.
lldb/include/lldb/Core/DumpRegisterValue.h
12

Core components (not just the Core libraries but basically everything that's not a plugin) shouldn't depend on plugins. There's still places in LLDB where this is the case but @bulbazord has been hard at work to clean those up. This would introduce a new one. Can we avoid this?

I'm supportive of this idea but I would like to find a way to do it without introducing dependencies on plugins in non-plugin code if possible.

lldb/include/lldb/Core/DumpRegisterValue.h
12

+1 Let's find a way to avoid this if possible. Perhaps we can use just TypeSystem?

lldb/source/Commands/CommandObjectRegister.cpp
212–213

Instead of being specific to clang, it would be better if we could use Target::GetScratchTypeSystemForLanguage(eLanguageTypeC) or something to this effect.

lldb/source/Core/DumpRegisterValue.cpp
136–137

This seems highly specific to C++... Let's try to find another way to do this, ideally with TypeSystem instead of TypeSystemClang and clang::CXXRecordDecl.

On the subject of not using TypeSystemClang, looking around it seems like TypeSystem is purely used to look up basic types, never to make new ones. When new types are made the code asks the TypeSystem for its TypeSystemClang and then uses that to do it.

Is that a limitation of TypeSystem or just that it's missing methods? I also looked for ways to build types without the type system then add them to it, but didn't find any. Do we have anyone who knows more about this? (I sure don't)

I could move the code into an existing plugin like Platform. Problems with that:

  • Platform implementations are plugins, the base class is not so I have to have 3 copies of the method.
  • When using a core file, the platform is always the host. Might be able to work around that by passing the core file's triple in each time.

Or put the code into a new plugin but then I've just got the same dependency problem with more steps.

There is code for dealing with user expressions so I could make the types out of strings and evaluate something like:

*reinterpret_cast(
  __attribute__((packed)) struct __lldb_register_fields_cpsr {
    uint32_t z;
    ...
  })(<fake address of register value)

It does feel like something that code wasn't intended to do.

On the subject of not using TypeSystemClang, looking around it seems like TypeSystem is purely used to look up basic types, never to make new ones. When new types are made the code asks the TypeSystem for its TypeSystemClang and then uses that to do it.

Is that a limitation of TypeSystem or just that it's missing methods? I also looked for ways to build types without the type system then add them to it, but didn't find any. Do we have anyone who knows more about this? (I sure don't)

I think @aprantl might know more about TypeSystems and what their intended use/capabilities are. It's entirely possible that TypeSystem is just not as flexible as it could be.

I could move the code into an existing plugin like Platform. Problems with that:

  • Platform implementations are plugins, the base class is not so I have to have 3 copies of the method.
  • When using a core file, the platform is always the host. Might be able to work around that by passing the core file's triple in each time.

Or put the code into a new plugin but then I've just got the same dependency problem with more steps.

There is code for dealing with user expressions so I could make the types out of strings and evaluate something like:

*reinterpret_cast(
  __attribute__((packed)) struct __lldb_register_fields_cpsr {
    uint32_t z;
    ...
  })(<fake address of register value)

It does feel like something that code wasn't intended to do.

Right, these solutions might not be as nice. I've been working to stop using plugins in non-plugin code for a few years now so I'd like to prevent us from adding new uses where possible. I don't want to block progress though! Let's see if @aprantl can help us out.

aprantl added inline comments.Mar 21 2023, 12:40 PM
lldb/source/Core/DumpRegisterValue.cpp
136–137

Are we saving a lot of code by going through the clang typesystem here, or would walking the bits and formatting the directly be roughly the same amount of code?

DavidSpickett added inline comments.Mar 22 2023, 2:08 AM
lldb/source/Core/DumpRegisterValue.cpp
136–137

Funny you should mention that.

I was initially doing that, but in the RFC it was suggested I use the existing printers for C types (https://discourse.llvm.org/t/rfc-showing-register-fields-in-lldb/64676/2). There's also a hint that this would make assigning to specific fields more easy (though that is a very far off goal).

This is what the first version looked like:

(lldb) register read cpsr
    cpsr = 0x60001000
| N | Z | C | V | TCO | DIT | UAO | PAN | SS | IL | SSBS | BTYPE | D | A | I | F | nRW | EL | SP |
| 0 | 1 | 1 | 0 |  0  |  0  |  0  |  0  | 0  | 0  |  1   |   0   | 0 | 0 | 0 | 0 |  0  | 0  | 0  |

Of course it would be refined based on line length and field size etc etc. A lot of the things the type printers already do, which is why using them looked like it would save some effort.

Some of that extra formatting code will get written anyway because I want to eventually add a register command that will show you the register layout. So table formatting is needed either way. That would produce something like:

(lldb) register info cpsr
| 31 | 30 | ...
| N  |  Z | ...

So overall no there's nothing preventing me from walking the bits. Assuming you already have code to format text tables nicely, it would be about the same amount of code.

I need to check if we have any generic table formatting code, or could extract some from somewhere.

aprantl added inline comments.Mar 22 2023, 9:18 AM
lldb/source/Core/DumpRegisterValue.cpp
136–137

Sorry I wasn't aware of the history :-)

aprantl added inline comments.Mar 22 2023, 9:23 AM
lldb/source/Commands/CommandObjectRegister.cpp
229

Do we need to pass this in explicitly or could the implementation do the Target::GetScratchTypeSystemForLanguage::GetForTarget(m_exe_ctx.GetTargetRef())? That would avoid introducing a plugin dependence at least in this file?

DavidSpickett added inline comments.Mar 22 2023, 9:26 AM
lldb/source/Core/DumpRegisterValue.cpp
136–137

Do you have an idea of how much work it would be to make TypeSystem do the same things TypeSystemClang does? Maybe more to the point, would I be trying to make TypeSystem do something it just isn't designed to do.

DavidSpickett added inline comments.Mar 22 2023, 9:28 AM
lldb/source/Commands/CommandObjectRegister.cpp
229

Yes I could do that. Though I think even Core/DumpRegisterValue.cpp using it isn't good.

Is that right @bulbazord ?

aprantl added inline comments.Mar 22 2023, 9:40 AM
lldb/source/Core/DumpRegisterValue.cpp
136–137

Fundamentally, creating new types is something so tightly coupled to a specific type system that it may not be a good idea to expose this through the generic TypeSystem interface.

We could design a generic interface that looks like

CompilerType ty = CreateRecordType({{"field1", CompilerType1}, {"field2", ...}})

but here you explicitly want to control packing and bitfields, and so you will end up with something that most likely only TypeSystemClang can fulfill anyway.

So the better solution may be along the lines of a register formatting plugin that depends on TypeSystemClang.

bulbazord added inline comments.Mar 22 2023, 10:29 AM
lldb/source/Commands/CommandObjectRegister.cpp
229

Yes, avoiding plugin dependencies in non-plugins would be ideal.

So the better solution may be along the lines of a register formatting plugin that depends on TypeSystemClang.

Based on discussion here and on another patch, I'm going to try Adrian's suggestion.

DavidSpickett planned changes to this revision.Apr 4 2023, 2:05 AM

Move the type generation into a new plugin type, RegisterTypeBuilder.

This means we're still resuing types if you read the same register
multiple times, but we are recreating the plugin each time. So that
could be improved if people think it's worth it.

The code path is now:

  • register read calls dumpregister
  • which calls GetRegisterType on Target
  • Target gets the plugin and asks it to make the type

The way it does all that is exactly the same as before, but now
we aren't linking directly to a plugin to do it.

Also I am assuming that RegisterTypeBuilderClang is the only implementation
and is always present (a singleton plugin sort of). Which seems fine but I
don't know if that's a good way to treat plugins in general.

This revision is now accepted and ready to land.Apr 5 2023, 7:32 AM
DavidSpickett edited the summary of this revision. (Show Details)Apr 5 2023, 7:33 AM
DavidSpickett marked 9 inline comments as done.Apr 5 2023, 7:35 AM

@bulbazord Please take a look and see if I've done the plugin/core code split correctly.

@bulbazord Please take a look and see if I've done the plugin/core code split correctly.

This looks fine. Thanks for being flexible and working with us to find something that maintains the split even if the end result feels more complex.