This is an archive of the discontinued LLVM Phabricator instance.

[LLDB] Enable non-trivial types in EmulateInstruction::Context
AbandonedPublic

Authored by DavidSpickett on Sep 16 2022, 6:41 AM.

Details

Reviewers
aprantl
clayborg
Summary

In future I want to extend RegisterInfo and that will likely be non trivial
types like string and vector. RegisterInfo gets stored in Context, so Context
must be changed to a discriminated enum that properly calls destructors.

(which would be std::variant but although llvm is using -std=c++17 I don't
think we can expect c++17 library features just yet)

As so much existing code does this:
Context ctx;
// Lots of code that interacts with ctx...
ctx.Set<...>

I wasn't able to switch to factory functions for the Set<...> methods without a lot
more changes. Instead I've made the default constructor construct to the NoArgs type
and then Set<...> will change the type later.

Whenever you call a Set<...> it will destroy whatever is currently in the union
before setting the new data. Potentially non trivial types like RegisterInfos here are
placement newed because otherwise operator= would try to destroy the existing data
which is in fact uninitialised memory as far as we're concerned.

For the destructor we switch on the current info_type to know what to destruct.
(info_type that was made private in a previous change, so we know we have control of it)

Some places in lldb were using memset to initalise RegisterInfos. I tracked these
down with GCC's -Wclass-memaccess.
https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Dialect-Options.html#index-Wclass-memaccess

To replace these I've added sensible default values for each member in RegisterInfo
and filled kinds with LLDB_INVALID_REGNUM in the default constructor.

Depends on D134039

Diff Detail

Event Timeline

DavidSpickett created this revision.Sep 16 2022, 6:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2022, 6:41 AM
DavidSpickett requested review of this revision.Sep 16 2022, 6:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2022, 6:41 AM

Note that this does not add any non trivial types but of course I did test it by adding some to RegisterInfo.

This work is early parts of https://discourse.llvm.org/t/rfc-showing-register-fields-in-lldb/64676.

lldb/source/Target/DynamicRegisterInfo.cpp
239

This being an instance where there is a now a sensible default constructor that is used instead.

If I read your summary correctly, then std::variant would simplify your code. LLVM still uses llvm::optional. As there is no LLVM variant, I would go for std::variant .

As there is no LLVM variant, I would go for std::variant

I was under the impression that we can use 17 language features but not library features, yet. Let me check that what the minimum toolchains have.

Variant is supported from:
libstdc++ 7.1
VS 2017 15.0
libcxx 4.0

So far so good relative to https://llvm.org/docs/GettingStarted.html#host-c-toolchain-both-compiler-and-standard-library.

https://en.cppreference.com/w/cpp/compiler_support/17#C.2B.2B17_library_features claims it is in apple clang 10.0.0 so unfortunately we're not quite there as the minimum is 9.3.

Sorry. I hoped that variant would make your life easier.

labath added a subscriber: labath.Sep 19 2022, 5:47 AM

I think the main reason that the RegisterInfo struct is such as it is, is because it wants to be trivially constructible (or whatever the c++ term for that is). I'm pretty sure that adding a std::string member will make it stop being that, and I don't think we suddenly want to run global constructors for all of the global RegisterInfo instances we have (and we have a lot of them).

If you wanted to preserve the existing patterns, the right way to store a string/array member inside a static struct would be to copy what is being done in the value_regs/invalidate_regs field, i.e., store the value itself in another static object, and then store the pointer to that object into RegisterInfo. For dynamic RegisterInfos (like those received from the lldb-server), you'd use DynamicRegisterInfo and DynamicRegisterInfo::Register, which have logic to persist/create storage for these values (that logic would have to be extended to handle the new fields as well).

That said I would *love* is someone changed the RegisterInfo structs into something saner, but I think that will need to be more elaborate than simply stuffing a std::vector member into it. I can give you my idea of how this could work, if you're interested in trying this out.

lldb/include/lldb/Core/EmulateInstruction.h
196

I actually think that the simplest solution here would be to store the RegisterInfos as pointers. Copying them around doesn't make sense, particularly if their size is about to grow.

342–345

I am not a C++ expert, but I have a strong suspicion that this is not correct. You're destroyed the whole object, but then are only recreating its individual submembers. I get that these are the only members which have non-trivial constructors, but I don't think that's how c++ works. I can confirm this with some c++ experts if you want, but I am pretty sure that you need to recreate the ContextUnion object as a whole here to avoid straying into UB territory.

lldb/include/lldb/lldb-private-types.h
67–77

Is this class still trivially constructible (not requiring dynamic initialization)?

Looks fine to me as this is all NFC with a look toward the future. Only one questions about if we want to switch the EmulateInstruction::GetRegisterInfo to return an optional as well?

lldb/source/Plugins/Instruction/ARM/EmulateInstructionARM.cpp
785–787

Do we want to switch this over to returning a llvm::Optional<RegisterInfo> as well in the base class?

I do share the concern that Pavel has where currently RegisterInfo does not require global constructors, and it would be good to keep it this way. Right now RegisterInfo is POD which means if there is a global array of RegisterInfo structs, it gets written into a data section and there is no code that runs.

So I might recommend adding a RegisterInfo member that is a "Fields *fields" that can be easily set to NULL in POD initializers, but then stored in the dynamic register info class in its own vector and then have the RegisterInfo struct point to another POD definition of the Fields.

Sorry. I hoped that variant would make your life easier.

Np, one can dream (until the next out of reach c++ feature comes along).

I think the main reason that the RegisterInfo struct is such as it is, is because it wants to be trivially constructible (or whatever the c++ term for that is). I'm pretty sure that adding a std::string member will make it stop being that, and I don't think we suddenly want to run global constructors for all of the global RegisterInfo instances we have (and we have a lot of them).

Agreed. I think I stumbled into a solution for this for different reasons.

I might have mentioned adding a std::string somewhere in this review but I only did that as a way to test this patch actually would work.

If you wanted to preserve the existing patterns, the right way to store a string/array member inside a static struct would be to copy what is being done in the value_regs/invalidate_regs field, i.e., store the value itself in another static object, and then store the pointer to that object into RegisterInfo. > For dynamic RegisterInfos (like those received from the lldb-server), you'd use DynamicRegisterInfo and DynamicRegisterInfo::Register, which have logic to persist/create storage for these values (that logic would have to be extended to handle the new fields as well).

Agreed.

That said I would *love* is someone changed the RegisterInfo structs into something saner, but I think that will need to be more elaborate than simply stuffing a std::vector member into it. I can give you my idea of how this could work, if you're interested in trying this out.

Sure I'd be interested in that. I've just been hacking on this small part of it so I don't have the full picture yet.

So I might recommend adding a RegisterInfo member that is a "Fields *fields" that can be easily set to NULL in POD initializers, but then stored in the dynamic register info class in its own vector and then have the RegisterInfo struct point to another POD definition of the Fields.

I though along the same lines but a different reason, I didn't want to store empty <register flags type> in every register info that didn't have flags and balloon the total size.

So I have:

struct RegisterInfo {
<...>
  /// If not nullptr, a custom type defined by XML descriptions.
  /// Using shared_ptr here means that the majority of registers that don't have
  /// custom types do not have to waste sizeof(RegisterFlags) bytes.
  std::shared_ptr<RegisterFlags> flags_type = nullptr;

But shared_ptr still has a destructor. It could be RegisterFlags*, I just have to work out when all that data needs to be dealocated. Probably when the target itself destructs but I need to figure that out, in general i don't know yet where the best place for these flags/fields to live is.

If it is a plain pointer then this patch is redundant but hey I learned how to use placement new :)

DavidSpickett added inline comments.Sep 21 2022, 1:43 AM
lldb/include/lldb/Core/EmulateInstruction.h
196

True, but sounds like we're going toward adding a pointer to the info. So that should keep the size constant.

342–345

I will check the rules here, it's not too hard to change anyway and it would mean I don't forget one of the members.

lldb/include/lldb/lldb-private-types.h
67–77

Current state, no. Due to the constructor setting kinds. Which is my attempt to remove the memset that was used elsewhere to do this, which sort of implies it was never trivially constructable really. Unless you knew you weren't going to read certain fields.

Given that:

#define LLDB_INVALID_REGNUM UINT32_MAX

And the memsets all used 0. At the very least it's opaque what the state of RegisterInfo is supposed to be.

That said I looked again at kNumRegisterKinds and it only goes up to 5. So I could just write this as:

uint32_t kinds[lldb::kNumRegisterKinds] = {LLDB_INVALID_REGNUM ... };

To avoid all this (I think I mixed it up with another enum with 100s of members).

lldb/source/Plugins/Instruction/ARM/EmulateInstructionARM.cpp
785–787

I will try this as a separate change, unconnected to this. In general any time we can change bool+ref to optional, I'll take it.

That said I would *love* is someone changed the RegisterInfo structs into something saner, but I think that will need to be more elaborate than simply stuffing a std::vector member into it. I can give you my idea of how this could work, if you're interested in trying this out.

Sure I'd be interested in that. I've just been hacking on this small part of it so I don't have the full picture yet.

I think that part of the problem is that nobody has a full picture of how RegisterInfos work anymore. :)

I don't claim to have this fully thought out, but the idea goes roughly like this. For the past few years, we've been moving towards a world where LLDB is able to fill in lots of details about the target registers. I think we're now at a state where it is sufficient for the remote stub to specify the register name and number, and lldb will be able to fill on most of the details about that register: EH/DWARF/"generic" numbers, subregisters, etc. However, this code is only invoked when communicating remote stub -- not for core files.
On one hand, this kind of makes sense -- for core files, we are the source of the register info, so why wouldn't we provide the complete info straight away? However, it means that the information that "ah is a one byte sub-register of rax at offset 1" is repeated multiple times (we currently have three core file plugins, times the number of architectures they support). If we made core file register infos go through this "augmentation" process, then we could unify our core file/live process flow more, and relieve the core file plugins of the burden of dealing with the subregisters -- all they'd need to know is how to read/write whole registers, and the generic code would take care of all the subregisters.
This would also mean that *all* register infos being handled by generic code would be DynamicRegisterInfos, which means we could drop the avoid this POD business, and just replace that class with something modern and easy to use. The only place where we would need to store static arrays would be in the bowels of individual plugins, but these would be simpler than the current RegisterInfo struct, as a lot of this info would be deduced (maybe including the register type information that you're trying to introduce), and we might even have each plugin store the info in whichever format it sees fit -- the only requirement would be that a DynamicRegisterInfo comes out at the end. Some plugins may choose not to store static info at all, as we're already running into the limits of what can be stored statically -- if an architecture has multiple optional registers sets (whose presence is only known at runtime), then its impossible to stuff those registers into a static array -- I believe all our AArch64 registers are currently dynamic for this reason.

I know this is somewhat vague, but that's why this is just an idea. Someone would have to try it out to find all the issues and figure them out. I can try to help if you want to take it on.

lldb/include/lldb/Core/EmulateInstruction.h
196

It should, but regardless of that, I'm surprised to see the structs being stored here, I'm not aware of any other place which stores RegisterInfos be value.

That said I would *love* is someone changed the RegisterInfo structs into something saner, but I think that will need to be more elaborate than simply stuffing a std::vector member into it. I can give you my idea of how this could work, if you're interested in trying this out.

Sure I'd be interested in that. I've just been hacking on this small part of it so I don't have the full picture yet.

I think that part of the problem is that nobody has a full picture of how RegisterInfos work anymore. :)

I don't claim to have this fully thought out, but the idea goes roughly like this. For the past few years, we've been moving towards a world where LLDB is able to fill in lots of details about the target registers. I think we're now at a state where it is sufficient for the remote stub to specify the register name and number, and lldb will be able to fill on most of the details about that register: EH/DWARF/"generic" numbers, subregisters, etc. However, this code is only invoked when communicating remote stub -- not for core files.
On one hand, this kind of makes sense -- for core files, we are the source of the register info, so why wouldn't we provide the complete info straight away?

An aside, I'm working with a group inside apple that has a JTAG style debugger that can access not only the normal general purpose registers/floating point/vector registers, but control registers like AArch64's MDSCR_EL1 as one example. I haven't figured out the best format for them to express this information in a Mach-O corefile yet, but I am thinking towards a Mach-O LC_NOTE where they embed an XML register description for all the registers they can provide (and it will require that size and maybe offset are explicitly specified, at least), and a register context buffer of bytes for all of the registers. lldb would need to augment its list of available registers with this. My vague memory is that they may have different registers available on each core ("thread") so we would need to do this on per-"thread" basis.

This is all vague hand-wavy at this point, but they have the information and the debugger users want this information. At some point I'll want the corefile to be able to augment or replace lldb's register information (probably augment).

That said I would *love* is someone changed the RegisterInfo structs into something saner, but I think that will need to be more elaborate than simply stuffing a std::vector member into it. I can give you my idea of how this could work, if you're interested in trying this out.

Sure I'd be interested in that. I've just been hacking on this small part of it so I don't have the full picture yet.

I think that part of the problem is that nobody has a full picture of how RegisterInfos work anymore. :)

I don't claim to have this fully thought out, but the idea goes roughly like this. For the past few years, we've been moving towards a world where LLDB is able to fill in lots of details about the target registers. I think we're now at a state where it is sufficient for the remote stub to specify the register name and number, and lldb will be able to fill on most of the details about that register: EH/DWARF/"generic" numbers, subregisters, etc. However, this code is only invoked when communicating remote stub -- not for core files.
On one hand, this kind of makes sense -- for core files, we are the source of the register info, so why wouldn't we provide the complete info straight away?

An aside, I'm working with a group inside apple that has a JTAG style debugger that can access not only the normal general purpose registers/floating point/vector registers, but control registers like AArch64's MDSCR_EL1 as one example. I haven't figured out the best format for them to express this information in a Mach-O corefile yet, but I am thinking towards a Mach-O LC_NOTE where they embed an XML register description for all the registers they can provide (and it will require that size and maybe offset are explicitly specified, at least), and a register context buffer of bytes for all of the registers. lldb would need to augment its list of available registers with this.

Thanks for jumping in Jason. I forgot about the size field. I think that we need that as well. As for the offset, how do the Mach-O core files work? Are all registers placed into a single note segment? (so that an "offset" is well-defined). In elf, the registers are scattered in multiple notes (roughly according to register sets), and we need to (arbitrarily) concatenate them so that a single offset value is meaningful. But what this really confirms to me that the notion of "static" register info lists is just not sufficient any more.

My vague memory is that they may have different registers available on each core ("thread") so we would need to do this on per-"thread" basis.

That's interesting, but I think we should actually be in a fairly good shape for that now, since in the case of ARM SVE, we can have each thread configure the scalable registers with a different size.

This is all vague hand-wavy at this point, but they have the information and the debugger users want this information. At some point I'll want the corefile to be able to augment or replace lldb's register information (probably augment).

Seems reasonable to me.

That said I would *love* is someone changed the RegisterInfo structs into something saner, but I think that will need to be more elaborate than simply stuffing a std::vector member into it. I can give you my idea of how this could work, if you're interested in trying this out.

Sure I'd be interested in that. I've just been hacking on this small part of it so I don't have the full picture yet.

I think that part of the problem is that nobody has a full picture of how RegisterInfos work anymore. :)

I don't claim to have this fully thought out, but the idea goes roughly like this. For the past few years, we've been moving towards a world where LLDB is able to fill in lots of details about the target registers. I think we're now at a state where it is sufficient for the remote stub to specify the register name and number, and lldb will be able to fill on most of the details about that register: EH/DWARF/"generic" numbers, subregisters, etc. However, this code is only invoked when communicating remote stub -- not for core files.
On one hand, this kind of makes sense -- for core files, we are the source of the register info, so why wouldn't we provide the complete info straight away?

An aside, I'm working with a group inside apple that has a JTAG style debugger that can access not only the normal general purpose registers/floating point/vector registers, but control registers like AArch64's MDSCR_EL1 as one example. I haven't figured out the best format for them to express this information in a Mach-O corefile yet, but I am thinking towards a Mach-O LC_NOTE where they embed an XML register description for all the registers they can provide (and it will require that size and maybe offset are explicitly specified, at least), and a register context buffer of bytes for all of the registers. lldb would need to augment its list of available registers with this.

Thanks for jumping in Jason. I forgot about the size field. I think that we need that as well. As for the offset, how do the Mach-O core files work? Are all registers placed into a single note segment? (so that an "offset" is well-defined). In elf, the registers are scattered in multiple notes (roughly according to register sets), and we need to (arbitrarily) concatenate them so that a single offset value is meaningful. But what this really confirms to me that the notion of "static" register info lists is just not sufficient any more.

Mach-o core files have registers in load commands (LC_THREAD) I believe. Within this, there is a register set flavor enum (GPR, FPU, EXC, DBG, etc), and then all of the registers are expected to be understood by whomever loads them and are given as here are N bytes for GPR. So something like:

LC_THREAD , CMDSIZE
FLAVOR, COUNT, BYTES
FLAVOR, COUNT, BYTES
ZERO

A Zero for flavor terminates the register data for a thread.

My vague memory is that they may have different registers available on each core ("thread") so we would need to do this on per-"thread" basis.

That's interesting, but I think we should actually be in a fairly good shape for that now, since in the case of ARM SVE, we can have each thread configure the scalable registers with a different size.

This is all vague hand-wavy at this point, but they have the information and the debugger users want this information. At some point I'll want the corefile to be able to augment or replace lldb's register information (probably augment).

Seems reasonable to me.

Typical core files seem to mostly have register data hard coded and the consumers are supposed to know exactly what is in register data (ELF core, mach-o core and minidump). Anything that gets us closer to our lldb-server workflow where we dynamically can figure out what the registers are is a good thing.

DavidSpickett abandoned this revision.Oct 6 2022, 7:43 AM

https://reviews.llvm.org/D135015 landed using std::variant and no one has complained so far, so this whole patch is now irrelevant. If it comes back, I'll just use variant.

I agree that there are reasons to keep things POD so I am working on that basis with the goal of being able to process information sent to us by remote stubs.

Core files and information provided by lldb itself, I'm thinking about in the background so thanks for the use cases there.