Page MenuHomePhabricator

[LLDB] Add class WasmProcess for WebAssembly debugging
Needs ReviewPublic

Authored by paolosev on Apr 24 2020, 1:57 AM.

Details

Summary

This is the fourth in a series of patches to enable LLDB debugging of WebAssembly code that runs in a WebAssembly engine. Previous patches added ObjectFile, SymbolVendor and DynamicLoader plugin classes for Wasm, see: D71575, D72751, D72650.

The idea is to use the GDB-remote protocol to connect to a Wasm engine that implements a GDB-remote stub that offers the ability to access the engine runtime internal state. This patch introduce a new Process plugin wasm, with:

  • Class ProcessWasm that inherits from ProcessGDBRemote and that provides functions to access the Wasm engine state through a GDBRemote connection.
  • Class ThreadWasm that inherits from ThreadGDBRemote and that provides functions to access the Wasm call stack and create the Wasm stack unwinder.
  • Class UnwindWasm that manages stack unwinding for Wasm.

The code that represents a DWARF expression is now separated from the logic to evaluate that expression. The former is still in class DWARFExpression, while for the latter a new plugin type is introduced, DWARFEvaluator.

  • Class DWARFEvaluator contains the generic code for evaluating DWARF expressions, and it is possible to introduce platform-specific plugin classes to evaluate platform-specific DWARF opcodes.
  • Class DWARFEvaluatorFactory is the plugin base-class that is initialized, once per module, and cached in a Module instance. It creates DWARFEvaluator objects for a given DWARFExpression.
  • Class WasmDWARFEvaluatorFactory represents the plugin to create DWARFEvaluators specific for WebAssembly.
  • class WasmDWARFEvaluator contains the logic to evaluate DW_OP_WASM_location, which requires accesssing the Wasm engine through gdb-remote to query the state of the Wasm program (locals, globals, stack, memory and data sections), and to send requests through gdb-remote.

Note that the GDB-remote protocol needs to be extended with a few Wasm-specific custom query commands, implemented in ProcessWasm and used to access Wasm-specific constructs like the Wasm memory, Wasm locals and globals.

Wasm addresses are encoded with 64 bits with this format:

63 61           32            0
+-+-------------+-------------+
|T|  module_id  |   offset    |
+-+-------------+-------------+

where T is 0:Code, 1:Data, 2:Memory.

Diff Detail

Event Timeline

paolosev created this revision.Apr 24 2020, 1:57 AM
paolosev marked 2 inline comments as done.Apr 24 2020, 2:03 AM

What is the best way to test classes WasmProcessGDBRemote and UnwindWasm?

lldb/source/Plugins/Process/wasm/ProcessWasm.cpp
93

This will be implemented as:

return GetGDBRemote().GetWasmLocal(frame_index, index, buf, buffer_size, size);

as soon as GetWasmLocal can be added to GDBRemoteCommunicationClient.

lldb/source/Plugins/Process/wasm/UnwindWasm.cpp
35–36

This cast works but it is ugly. Is there a better coding pattern I could use here?

Before we get into the details of this patch (with all the criss-cross friends and dependencies there's a lot to talk about there too), could you give an overview of how do you imagine this working as a whole, and why it is necessary to create these new classes. Having spoken to some wasm folks, I think I know the answers to some of the "why"s. However, I don't think other developers do, and even I don't know the "how" story.

clayborg added a comment.EditedApr 26 2020, 1:38 PM

So a few things here. It doesn't seem like it is necessary to create the WasmProcessGDBRemote and IWasmProcess. It would be fine to extend the current ProcessGDBRemote and ThreadGDBRemote classes. The whole reason seems to be that the variables (globals, locals, etc) are fetched through the GDB server API and that doesn't happen for other users of the protocol where this information is fetched via the debug info. Is this correct? You seem to have debug info and DWARF (since you mentioned a new DWARF expression opcode), so how do variables actually work? Do you use debug info? What info for variables do you need to fetch from the API?

It also seems that you fetch the stack backtrace via the GBB remote protocol as well. This would be easy to add in to the generic GDB remote protocol. This could also be built in at the lldb_private::Process/lldb_private::Thread API level where a process/thread specifies it fetches the variables and/or stack itself instead of letting the unwind engine do its thing. This can be really useful for instance if a core or minidump file that is able to store a backtrace so that when you don't have all the system libraries you can still get a good backtrace from a core file. So the backtrace part should definitely be part of the core LLDB logic where it can ask a process or thread if it provides a backtrace or not and we add new virtual APIs to the lldb_private::Process/lldb_private::Thread classes to detect and handle this. The ProcessGDBRemote and ThreadGDBRemote would then implement these functions and answer "yes" if the GDB server supports fetching these things.

So if you can elaborate in detail how variables work and how the stack trace works and exactly what needs to go through the GDB server API, we can work out how this should happen in LLDB. From what I understand right now I would:

  • modify lldb_private::Process/lldb_private::Thread to add new virtual (not pure virtual) APIs that answer "false" when asked if the process/thread provides variables and stacks
  • modify the GDB remote protocol to handle a new "qSupported" variant that asks if variables and stacks are supported via the API. Most GDB servers will answer with not supported. See https://sourceware.org/gdb/current/onlinedocs/gdb/General-Query-Packets.html#qSupported
  • modify ProcessGDBRemote and ThreadGDBRemote to override these APIs and answer "true" to handling variables and stack if the server supports this.
  • Modify the unwind code to ask the lldb_private::Thread if it provides a backtrace. If true, then skip the normal unwind and use the new APIs on lldb_private::Thread
  • Remove the ProcessWasm code and UnwindWasm code.
paolosev updated this revision to Diff 260216.EditedApr 26 2020, 11:31 PM

I am adding all the pieces to this patch to make the whole picture clearer; I thought to add a piece at the time to simplify reviews, but probably it ended up making things more obscure. I can always split this patch later and I need to refactor everything anyway.

So, the idea is to use DWARF as debug info for Wasm, as it is already supported by LLVM and Emscripten. For this we introduced some time ago the plugin classes ObjectFileWasm, SymbolVendorWasm and DynamicLoaderWasmDYLD. However, WebAssembly is peculiarly different from the native targets. When source code is compiled to Wasm, Clang produces a module that contains Wasm bytecode (a bit like it happens with Java and C#) and the DWARF info refers to this bytecode.
The Wasm module then runs in a Wasm runtime. (It is also possible to AoT-compile Wasm to native, but this is outside the scope of this patch).

Therefore, LLDB cannot debug Wasm by just controlling the inferior process, but it needs to talk with the Wasm engine to query the Wasm engine state. For example, for backtrace, only the runtime knows what is the current call stack. Hence the idea of using the gdb-remote protocol: if a Wasm engine has a GDB stub LLDB can connect to it to start a debugging session and access its state.

Wasm execution is defined in terms of a stack machine. There are no registers (besides the implicit IP) and most Wasm instructions push/pop values into/from a virtual stack. Besides the stack the other possible stores are a set of parameters and locals defined in the function, a set of global variables defined in the module and the module memory, which is separated from the code address space.

The DWARF debug info to evaluate the value of variables is defined in terms of these constructs. For example, we can have something like this in DWARF:

0x00005a88:      DW_TAG_variable
                          DW_AT_location	(0x000006f3: 
                             [0x00000840, 0x00000850): DW_OP_WASM_location 0x0 +8, DW_OP_stack_value)
                          DW_AT_name	("xx")
                          DW_AT_type	(0x00002b17 "float")
                          […]

Which says that on that address range the value of ‘xx’ can be evaluated as the content of the 8th local. Here DW_OP_WASM_location is a Wasm-specific opcode, with two args, the first defines the store (0: Local, 1: Global, 2: the operand stack) and the index in that store. In most cases the value of the variable could be retrieved from the Wasm memory instead.

So, when LLDB wants to evaluate this variable, in DWARFExpression::Evaluate(), it needs to know what is the current the value of the Wasm locals, or to access the memory, and for this it needs to query the Wasm engine.

This is why there are changes to DWARFExpression::Evaluate(), to support the DW_OP_WASM_location case, and this is also why I created a class that derives from ProcessGDBRemote and overrides ReadMemory() in order to query the wasm engine. Also Value::GetValueAsData() needs to be modified when the value is retrieved from Wasm memory.

GDBRemoteCommunicationClient needs to be extended with a few Wasm-specific query packets:

  • qWasmGlobal: query the value of a Wasm global variable
  • qWasmLocal: query the value of a Wasm function argument or local
  • qWasmStackValue: query the value in the Wasm operand stack
  • qWasmMem: read from a Wasm memory
  • qWasmCallStack: retrieve the Wasm call stack.

These are all the changes we need to fully support Wasm debugging.

Why the IWasmProcess interface? I was not sure whether gdb-remote should be the only way to access the engine state. In the future LLDB could also use some other (and less chatty) mechanisms to communicate with a Wasm engine. I did not want to put a dependency on GDBRemote in a class like DWARFExpression or Value, which should not care about these details. Therefore, I thought that the new class WasmProcessGDBRemote could implement the IWasmProcess interface, forwarding requests through the base class ProcessGDBRemote which then send the new gdb-remote query packets. But I agree that this makes the code certainly more convoluted and quite ugly.

My initial idea was to keep all the Wasm-related code as much as possible isolated in plugin classes. Now, I guess that the next steps instead would be to refactor the code to eliminate the new classes WasmProcessGDBRemote and UnwindWasm and modify existing ProcessGDBRemote and ThreadGDBRemote instead. However, I am not sure if this is possible without touching also the base classes Process and Thread. For example, let’s consider function DWARFExpression::Evaluate(). There, when the DWARF opcode is DW_OP_WASM_location, we need to access the Wasm state. We can get to the Process object with frame->CalculateProcess() and then can we assume the process must always be a ProcessGDBRemote if the target machine is a llvm::Triple::wasm32 and cast Process* to ProcessGDBRemote* and then use Wasm-specific query functions added to that class? Would this pattern be acceptable, in your opinion?

PS, I am sorry for the late reply… this lockdown is making me a little unproductive… :-(

So a few things here. It doesn't seem like it is necessary to create the WasmProcessGDBRemote and IWasmProcess. It would be fine to extend the current ProcessGDBRemote and ThreadGDBRemote classes. The whole reason seems to be that the variables (globals, locals, etc) are fetched through the GDB server API and that doesn't happen for other users of the protocol where this information is fetched via the debug info. Is this correct? You seem to have debug info and DWARF (since you mentioned a new DWARF expression opcode), so how do variables actually work? Do you use debug info? What info for variables do you need to fetch from the API?

It also seems that you fetch the stack backtrace via the GBB remote protocol as well. This would be easy to add in to the generic GDB remote protocol. This could also be built in at the lldb_private::Process/lldb_private::Thread API level where a process/thread specifies it fetches the variables and/or stack itself instead of letting the unwind engine do its thing. This can be really useful for instance if a core or minidump file that is able to store a backtrace so that when you don't have all the system libraries you can still get a good backtrace from a core file. So the backtrace part should definitely be part of the core LLDB logic where it can ask a process or thread if it provides a backtrace or not and we add new virtual APIs to the lldb_private::Process/lldb_private::Thread classes to detect and handle this. The ProcessGDBRemote and ThreadGDBRemote would then implement these functions and answer "yes" if the GDB server supports fetching these things.

So if you can elaborate in detail how variables work and how the stack trace works and exactly what needs to go through the GDB server API, we can work out how this should happen in LLDB. From what I understand right now I would:

  • modify lldb_private::Process/lldb_private::Thread to add new virtual (not pure virtual) APIs that answer "false" when asked if the process/thread provides variables and stacks

The above idea is fairly interesting, but I don't see why a new API like that would be necessary to implement it. We already have an abstraction for a producer of stack frames -- the Unwind class. Reusing the existing abstraction (as this patch does) seems like simpler/cleaner design then adding a new api, and then having users switch on its value.

My initial idea was to keep all the Wasm-related code as much as possible isolated in plugin classes.

While I would definitely like to see that happen, I don't think the current approach achieves that. The "IWasmProcess" is still in the wasm plugin so we still end up with a lot of "core" code depending on the wasm plugin. And if we put IWasmProcess into the "core", then it's not much different than putting the relevant APIs into the "Process" class directly (though it could introduce some grouping which might count for something). If we accept that DW_OP_WASM_location as a first-class entity, then having some core interfaces to support it would not seem unreasonable. The thing which makes that blurry is that this is a vendor extension, not an official "first class" thing.

Now, I guess that the next steps instead would be to refactor the code to eliminate the new classes WasmProcessGDBRemote and UnwindWasm and modify existing ProcessGDBRemote and ThreadGDBRemote instead.

It may be interesting to see how that ends up looking like (maybe you could put that in a separate patch to compare), but I don't think that at this point we have chosen the right way to go forward, and we still need to discuss/think about things...

PS, I am sorry for the late reply… this lockdown is making me a little unproductive… :-(

You replied on the next business day. I don't think this is late by any standard.

Thanks for your comments!
I have refactored this code in a separate patch, https://reviews.llvm.org/D78978, removing WasmProcessGDBRemote, moving part of the logic into ProcessGDBRemote but still keeping class UnwindWasm.
Let me know what you think...

I am adding all the pieces to this patch to make the whole picture clearer; I thought to add a piece at the time to simplify reviews, but probably it ended up making things more obscure. I can always split this patch later and I need to refactor everything anyway.

So, the idea is to use DWARF as debug info for Wasm, as it is already supported by LLVM and Emscripten. For this we introduced some time ago the plugin classes ObjectFileWasm, SymbolVendorWasm and DynamicLoaderWasmDYLD. However, WebAssembly is peculiarly different from the native targets. When source code is compiled to Wasm, Clang produces a module that contains Wasm bytecode (a bit like it happens with Java and C#) and the DWARF info refers to this bytecode.
The Wasm module then runs in a Wasm runtime. (It is also possible to AoT-compile Wasm to native, but this is outside the scope of this patch).

Therefore, LLDB cannot debug Wasm by just controlling the inferior process, but it needs to talk with the Wasm engine to query the Wasm engine state. For example, for backtrace, only the runtime knows what is the current call stack. Hence the idea of using the gdb-remote protocol: if a Wasm engine has a GDB stub LLDB can connect to it to start a debugging session and access its state.

Wasm execution is defined in terms of a stack machine. There are no registers (besides the implicit IP) and most Wasm instructions push/pop values into/from a virtual stack. Besides the stack the other possible stores are a set of parameters and locals defined in the function, a set of global variables defined in the module and the module memory, which is separated from the code address space.

The DWARF debug info to evaluate the value of variables is defined in terms of these constructs. For example, we can have something like this in DWARF:

0x00005a88:      DW_TAG_variable
                          DW_AT_location	(0x000006f3: 
                             [0x00000840, 0x00000850): DW_OP_WASM_location 0x0 +8, DW_OP_stack_value)
                          DW_AT_name	("xx")
                          DW_AT_type	(0x00002b17 "float")
                          […]

Which says that on that address range the value of ‘xx’ can be evaluated as the content of the 8th local. Here DW_OP_WASM_location is a Wasm-specific opcode, with two args, the first defines the store (0: Local, 1: Global, 2: the operand stack) and the index in that store. In most cases the value of the variable could be retrieved from the Wasm memory instead.

So is there memory to be read from the WASM runtime? Couldn't DW_OP_WASM_location 0x0 +8 be turned into an address that can be used to read the variable? It is also unclear what DW_OP_stack_value is used for here. The DWARF expression has no idea how many bytes to read for this value unless each virtual stack location knows how big it is? What happens if you have an array of a million items? That will not fit on the DWARF expression stack and each member would need to be read from memory?

It seems like the DW_OP_WASM_location + args should result in the address of the variable being pushed into the stack and the DW_OP_stack_value should be removed. This would mean at the end of the expression the address of the variable is on the stack and LLDB will just read it using the normal memory read? Am I missing something? Are there multiple memory regions? Are variables not considered to be in memory?

So, when LLDB wants to evaluate this variable, in DWARFExpression::Evaluate(), it needs to know what is the current the value of the Wasm locals, or to access the memory, and for this it needs to query the Wasm engine.

This is why there are changes to DWARFExpression::Evaluate(), to support the DW_OP_WASM_location case, and this is also why I created a class that derives from ProcessGDBRemote and overrides ReadMemory() in order to query the wasm engine. Also Value::GetValueAsData() needs to be modified when the value is retrieved from Wasm memory.

It would be fine to ask the lldb_private::Process class to evaluate any unknown DWARF expression opcodes like DW_OP_WASM_location and return the result.

Why do we need to override read memory? Is there more than one address space? Can't the DWARF expression DW_OP_WASM_location + args turn into an address that normal read memory can access? Or are the virtual stacks separate and not actually in the address space? If the virtual stack slot for locals/globals and stack values always know their sizes and can provide the contents, the DW_OP_WASM_location opcode should end up creating a buffer just like DW_OP_piece does and the value will be contained in there in the DWARF expression and there is no need for the DW_OP_stack_value?

GDBRemoteCommunicationClient needs to be extended with a few Wasm-specific query packets:

  • qWasmGlobal: query the value of a Wasm global variable
  • qWasmLocal: query the value of a Wasm function argument or local
  • qWasmStackValue: query the value in the Wasm operand stack

These three could be boiled down to a "qEvaluateCustomDWARFExpressionOpcode" packet (shorter name please!) and the args like 0x0 and +8 can be sent. The result could provide the bytes for the value?

  • qWasmMem: read from a Wasm memory

How does normal memory reading differ from Wasm memory?

  • qWasmCallStack: retrieve the Wasm call stack.

Seems like this packet doesn't need to be Wasm specific. Are there any other GDB remote packets that fetch stack traces already that we would re-use?

These are all the changes we need to fully support Wasm debugging.

Why the IWasmProcess interface? I was not sure whether gdb-remote should be the only way to access the engine state. In the future LLDB could also use some other (and less chatty) mechanisms to communicate with a Wasm engine. I did not want to put a dependency on GDBRemote in a class like DWARFExpression or Value, which should not care about these details. Therefore, I thought that the new class WasmProcessGDBRemote could implement the IWasmProcess interface, forwarding requests through the base class ProcessGDBRemote which then send the new gdb-remote query packets. But I agree that this makes the code certainly more convoluted and quite ugly.

My initial idea was to keep all the Wasm-related code as much as possible isolated in plugin classes. Now, I guess that the next steps instead would be to refactor the code to eliminate the new classes WasmProcessGDBRemote and UnwindWasm and modify existing ProcessGDBRemote and ThreadGDBRemote instead. However, I am not sure if this is possible without touching also the base classes Process and Thread. For example, let’s consider function DWARFExpression::Evaluate(). There, when the DWARF opcode is DW_OP_WASM_location, we need to access the Wasm state. We can get to the Process object with frame->CalculateProcess() and then can we assume the process must always be a ProcessGDBRemote if the target machine is a llvm::Triple::wasm32 and cast Process* to ProcessGDBRemote* and then use Wasm-specific query functions added to that class? Would this pattern be acceptable, in your opinion?

A new virtual function in lldb_private::Process like:

class Process {
  virtual Error EvaluateCustomDWARFExpressionOpcode(uint16_t opcode, uint64_t arg1, uint64_t arg2) {
    return createStringError(std::errc::invalid_argument, "unhandled DWARF expression opcode");
  }

could be added, and then the ProcessGDBRemote can pass this along to the GDB server. Anything in DWARFExpression needs to _only_ call virtual functions on lldb_private::Process/Thread/StackFrame and no deps should be added on custom plug-ins.

It would be fine to ask the lldb_private::Process class to evaluate any unknown DWARF expression opcodes like DW_OP_WASM_location and return the result.

While that idea has occurred to me too, I am not convinced it is a good one:

  • it replaces one odd dependency with another one. Why should a Process need to know how to evaluate a DWARF expression? Or even that DWARF exists for that matter? This seems totally unrelated to what other Process functions are doing currently...
  • I am not sure it even completely removes wasm knowledge from e.g. DWARFExpression -- the class would presumably still need to know how to parse this opcode.
  • the interface could get very complicated if we wanted to implement typed stacks present in DWARF5 -- presumably the API would need to return the type of the result, in addition to its value.

So is there memory to be read from the WASM runtime? Couldn't DW_OP_WASM_location 0x0 +8 be turned into an address that can be used to read the variable? It is also unclear what DW_OP_stack_value is used for here. The DWARF expression has no idea how many bytes to read for this value unless each virtual stack location knows how big it is? What happens if you have an array of a million items? That will not fit on the DWARF expression stack and each member would need to be read from memory?

It seems like the DW_OP_WASM_location + args should result in the address of the variable being pushed into the stack and the DW_OP_stack_value should be removed. This would mean at the end of the expression the address of the variable is on the stack and LLDB will just read it using the normal memory read? Am I missing something? Are there multiple memory regions? Are variables not considered to be in memory?

DW_OP_WASM_location 0x0 +8 is not really in memory, or more precisely, its runtime representation is an internal detail of the Wasm runtime.
WebAssembly code has a peculiar structure, see for example https://developer.mozilla.org/en-US/docs/WebAssembly/Understanding_the_text_format for more details.
Ignoring memory for a moment, there are no registers in Wasm and instead Wasm instructions read/write from/to function locals, module globals and stack operands, which can only have one of these types:

  • i32: 32-bit integer
  • i64: 64-bit integer
  • f32: 32-bit floating point
  • f64: 64-bit floating point

There is still is ongoing work in LLVM (https://reviews.llvm.org/D77353/new/#change-OJue38RNV2Gz) to define the perfect representation of these Wasm constructs in DWARF, but currently what is generated by LLVM has this format:

DW_OP_WASM_location wasm-op index

Where:

DW_OP_WASM_location := 0xED
wasm-op := wasm-local | wasm-global | wasm-operand-stack

wasm-local := 0x00 i:uleb128            (The value is located in the currently executing function’s index-th local)
wasm-global := 0x01 i:uleb128           (The value is located in the index-th global)
wasm-operand-stack := 0x02 i:uleb128    (The value is located in the indexth entry on the operand stack)

https://yurydelendik.github.io/webassembly-dwarf/ describes the rationale behind the addition of DW_OP_WASM_location to DWARF.

For example a function like:

int add(int a, int b) { return a + b; }

Could be compiled to:

(func $add (param $lhs i32) (param $rhs i32) (result i32)
  local.get $lhs
  local.get $rhs
  i32.add)

and the corresponding DWARF would describe that:

  • the value of a can be retrieved as DW_OP_WASM_location 0 0 (first local in the function)
  • the value of b can be retrieved as DW_OP_WASM_location 0 1 (second local in the function)

Of course DW_OP_WASM_location cannot represent the values of complex types. For a complex type like a C++ array with 1M items:

uint8_t* p = new uint8_t[1000000];

DWARF would describe the location of the pointer p (for example it could be in a local) and then the debugger would find DWARF info that describes its type, it would then send a request like qWasmLocal to get the value from the Wasm runtime, and receive the value of p, let’s say 0x8000c000.
From there LLDB might query to read chunks of memory starting from 0x8000c000, if the user asks to explore the content of the array.

Note that not all Wasm code requires the new location description DW_OP_WASM_location. In many cases locations are encoded using preexisting codes. For example when compiling without optimizations, -O0, almost all variables are encoded as a delta from the frame pointer register. But the frame pointer register itself is often defined as a DW_OP_WASM_location:

0x00000112:   DW_TAG_subprogram
                DW_AT_low_pc	(0x0000000000000761)
                DW_AT_high_pc	(0x00000000000007db)
                DW_AT_frame_base	(DW_OP_WASM_location 0x0 +4, DW_OP_stack_value)
                DW_AT_linkage_name	("_Z10quick_sortI4NodeIyE4lessIS1_EEvPT_xT0_")
                DW_AT_name	("quick_sort<Node<unsigned long long>, less<Node<unsigned long long> > >")
                DW_AT_decl_file	("C:\dev\test\emscripten_tests\sort\.\sort.h")
                DW_AT_decl_line	(45)
                DW_AT_external	(true)

0x0000012a:     DW_TAG_formal_parameter
                  DW_AT_location	(DW_OP_fbreg +20)
                  DW_AT_name	("array")
                  DW_AT_type	(0x000003bb "Node<unsigned long long>*")
                  …

This would also work because LLDB would send a qWasmLocal to calculate the value of the frame register.

Why do we need to override read memory? Is there more than one address space? Can't the DWARF expression DW_OP_WASM_location + args turn into an address that normal read memory can access? Or are the virtual stacks separate and not actually in the address space? If the virtual stack slot for locals/globals and stack values always know their sizes and can provide the contents, the DW_OP_WASM_location opcode should end up creating a buffer just like DW_OP_piece does and the value will be contained in there in the DWARF expression and there is no need for the DW_OP_stack_value?

How does normal memory reading differ from Wasm memory?

In WebAssembly the memory address space is separated from the code address space. Each Wasm modules has a ‘Code’ section with the wasm bytecode.
A Wasm module also has one (for the moment only one) Memory, which is a linear, byte-addressable range of bytes, of a configured size.
So there are two separated address spaces for code and memory, and DWARF info refers to both: address ranges are defined as offsets from the start of the Code section in the module, while location expressions imply reading from Wasm Memory instances.

This is why we need qWasmMem. When GDBProcess:: ReadMemory is called during this process, it sends "m" packets to the Wasm engine, which may be interpreted as reads from the module Code address space. But we also need a different way to express reads from the module Memory space.

For the code address space, the idea is to use a 64-bit virtual address space, where the code of each module is located at module_id << 32.

0x00000000`00000000 +------------------------------------+
                    |                                    |
                    |                                    |
                    |                                    |
0x00000001`00000000 +------------------------------------+
                    |  code module_id 1                  |
                    |                                    |
                    .                                    .
0x00000002`00000000 +------------------------------------+
                    |  code module_id 2                  |
                    .                                    .
0x00000003`00000000 +------------------------------------+
                    ~                                    ~

Classes ObjectFileWasm, DynamicLoaderWasmDYLD already support this, therefore LLDB emits requests to read memory at 64 addresses so formed.

But to read from the memory instances, as said, we need a separate command, qWasmMem. This is the reason why Value::GetValueAsData is modified in this patch, to check if we are debugging Wasm, and in that case we want to use qWasmMem because evaluating a value we are reading from the Wasm memory address space, not from the Code address space.

The GDB-remote query extensions are currently defined in the following way:

// Get a Wasm global value in the Wasm module specified.
// IN : $qWasmGlobal:frame_index;index
// OUT: $xx..xx

// Get a Wasm local value in the stack frame specified.
// IN : $qWasmLocal:frame_index;index
// OUT: $xx..xx

// Get a Wasm local from the operand stack at the index specified.
// IN : qWasmStackValue:frame_index;index
// OUT: $xx..xx

// Read Wasm memory.
// IN : $qWasmMem:frame_index;addr;len
// OUT: $xx..xx

// Get the current call stack.
// IN : $qWasmCallStack
// OUT: $xx..xxyy..yyzz..zz (A sequence of uint64_t values represented as consecutive 8-bytes blocks).

All packets contain a frame_index, that the runtime can use to identify the Wasm module the query refers to.
The size of the returned hex chars represent the size of the returned value. For qWasmGlobal, qWasmLocal, qWasmStackValue, currently the size can only be 4 or 8 bytes, but for qWasmMem it should match the number of bytes requested in the query.

These three could be boiled down to a "qEvaluateCustomDWARFExpressionOpcode" packet (shorter name please!) and the args like 0x0 and +8 can be sent. The result could provide the bytes for the value?

It is absolutely true that the first three packets (qWasmGlobal, qWasmLocal, qWasmStackValue) could be condensed in a single packet with an additional argument that describes the type of store.

qWasmCallStack: retrieve the Wasm call stack.

Seems like this packet doesn't need to be Wasm specific. Are there any other GDB remote packets that fetch stack traces already that we would re-use?

For qWasmCallStack, I could not find in the GDBRemote protocol (https://sourceware.org/gdb/current/onlinedocs/gdb/General-Query-Packets.html#General-Query-Packets) an existing command to query a thread call stack.

A new virtual function in lldb_private::Process like:

class Process {
  virtual Error EvaluateCustomDWARFExpressionOpcode(uint16_t opcode, uint64_t arg1, uint64_t arg2) {
    return createStringError(std::errc::invalid_argument, "unhandled DWARF expression opcode");
  }

could be added, and then the ProcessGDBRemote can pass this along to the GDB server. Anything in DWARFExpression needs to _only_ call virtual functions on lldb_private::Process/Thread/StackFrame and no deps should be added on custom plug-ins.

Having EvaluateCustomDWARFExpressionOpcode could work for DWARFExpression::Evaluate, with the drawbacks mentioned by @labath; but it would not help with Value::GetValueAsData(), I am afraid.

It would be fine to ask the lldb_private::Process class to evaluate any unknown DWARF expression opcodes like DW_OP_WASM_location and return the result.

While that idea has occurred to me too, I am not convinced it is a good one:

  • it replaces one odd dependency with another one. Why should a Process need to know how to evaluate a DWARF expression? Or even that DWARF exists for that matter? This seems totally unrelated to what other Process functions are doing currently...

But it is what people do in reality. DW_OP_low_user and DW_OP_high_user are ranges that are made available to people to customize their DWARF opcodes. If you don't handle it, you are hosed and can't show a variable location. And to make things worse, two different compilers could both use the same value in that range. So they made DWARF expressions customizable with no real attempt to make them function for different architectures. that is unless you standardize it and make a real opcode that gets accepted into DWARF. The kind of DWARF location opcode that is being used here could easily be generalized into a DW_OP_get_stack_variable with a bunch of args, but at some point you have to talk to someone that is in communication with the runtime of the thing you are debugging to get the answer. So I do believe asking the process for this is not out of scope.

  • I am not sure it even completely removes wasm knowledge from e.g. DWARFExpression -- the class would presumably still need to know how to parse this opcode.

It is true and this is another hole in the "people can extend DWARF easily" scenario. We need to know what opcode arguments are and that would need to be hard coded for now. But it wouldn't have to rely on anything except virtual function on the generic lldb_private::Process/Thread APIs. In this case as soon as we get an unknown opcode we would need to pass the DataExtractor and the offset into it so the process could extract the arguments. Not clean, but better than making DWARFExpression depend on process plug-ins IMHO.

  • the interface could get very complicated if we wanted to implement typed stacks present in DWARF5 -- presumably the API would need to return the type of the result, in addition to its value.

DWARF5 just further clarifies what each value on the opcode stack is (file address, load address, the value itself, etc). Right now DWARF expression just infer what a value is based on the opcodes. So I don't see a huge problem here as anything we do will need to work with DWARF5.

Thanks for the explanations if everything WASM related. I now understand much better what you have.

Read Wasm memory.
IN : $qWasmMem:frame_index;addr;len
OUT: $xx..xx

frame index seems weird in a memory read packet. Seems like the module ID should be passed instead of the frame index.

Reading memory could be handled with memory identifiers, or segments. Currently the packets for m and M only have an address, but someone out there must support reading from CODE and DATA address segments in the DSP world. I have an email out to Ted Woodward to see what they do for Qualcomm's hexagon DSPs. I'll let you know what I find. Maybe each WASM module can identify N segments it needs and each module would have its own unique segments. Are module ID's just 1 based indexes?

Could we just always use memory reading and have the address contain more info? Right now you have the top 32 bits for the module ID. Could it be something like:

struct WasmAddress {
  uint64_t module_id:16;
  uint64_t space:4; // 0 == code, 1 == data, 2 == global, 3==local, 4 == stack
  uint64_t frame_id:??;
  uint64_t addr: ??;
}

This would be a bitfield that would all fit into a 64 bit value and could then be easily sent to the GDB server with the standard m and M packets.

Thanks for the explanations if everything WASM related. I now understand much better what you have.

Read Wasm memory.
IN : $qWasmMem:frame_index;addr;len
OUT: $xx..xx

frame index seems weird in a memory read packet. Seems like the module ID should be passed instead of the frame index.

Reading memory could be handled with memory identifiers, or segments. Currently the packets for m and M only have an address, but someone out there must support reading from CODE and DATA address segments in the DSP world. I have an email out to Ted Woodward to see what they do for Qualcomm's hexagon DSPs. I'll let you know what I find. Maybe each WASM module can identify N segments it needs and each module would have its own unique segments. Are module ID's just 1 based indexes?

True, for qWasmMem (and actually also for qWasmGlobal) it would be sufficient to pass the moduleId, but qWasmLocal and qWasmStackValue really need a frame index, and it is very easy for the runtime to find the module from a frame index, that's why I was passing only frame indices for uniformity. Easy to change :)

For reading memory, yes, I came up with ad hoc mechanism that works for Wasm, but if I could reuse existing mechanisms already used by other architectures and supported by Wasm, obviously it would be better.

Could we just always use memory reading and have the address contain more info? Right now you have the top 32 bits for the module ID. Could it be something like:

struct WasmAddress {
  uint64_t module_id:16;
  uint64_t space:4; // 0 == code, 1 == data, 2 == global, 3==local, 4 == stack
  uint64_t frame_id:??;
  uint64_t addr: ??;
}

This would be a bitfield that would all fit into a 64 bit value and could then be easily sent to the GDB server with the standard m and M packets.

This is interesting. We could certainly use a few bits to specify the space 0 == code, 1 == data, 2 == global, 3==local, 4 == stack and then have either the module_id or the frame_index, according to the space, and just send "m" packets.

struct WasmAddress {
  uint64_t scope:3;
  uint64_t module_id_or_frame_index:29;
  uint64_t address: 32;
}

But then these "m" packets would have to be interpreted according to these rules by a Wasm-aware GDB-remote stub, so I am not sure we would gain much besides avoiding the introduction of four new custom query commands.
In a function like Value::GetValueAsData we would still have to have Wasm-specific code to generate the memory address in this format, and actually there it is easier to pass the current frame_index, which is readily available, rather than calculating the corresponding module_index.

What if the wasm engine actually made some effort to present a more "conventional" view of the wasm "process"? I.e., in addition to the "implied" PC, we could have an "implied" SP (and maybe an FP too). Then it could lay out the call stack and the function arguments in "memory", and point the "SP" to it so that lldb is able to reconstruct the frames&variables using the "normal" algorithm. For the sake of exposition, lets assume the following "encoding" of the 64 bit memory space:

unsigned:16 type; // 0x5555 = stack
unsigned:16 tid; // are there threads in wasm?
unsigned:16 frame; // grows down, 0x0000 = bottommost frame (main), 0x0001 = second from bottom, etc.
unsigned:16 address;

Then the engine could say that the "SP" of thread 0x1234 is 0x5555123400050000, "FP" is `0x5555123400048010 and the have the memory contents be

0x5555123400048020: // third "local" (of frame 4)
0x5555123400048018: // second "local" (of frame 4)
0x5555123400048010: // first "local" (of frame 4)
0x5555123400048008: 0x5555123400038010 // previous FP (frame 3)
0x5555123400048000: ???? // previous PC (frame 3)
0x5555123400040010: // third "argument" (of frame 4)
0x5555123400040008: // second "argument" (of frame 4)
0x5555123400040000: // first "argument" (of frame 4)
0x5555123400038020: // third "local" (of frame 3)
0x5555123400038018: // second "local" (of frame 3)
0x5555123400038010: // first "local" (of frame 3)
0x5555123400038008: 0x5555123400028010 // previous FP (frame 2)
0x5555123400038000: ???? // previous PC (frame 2)
etc.

Then all it would be needed is to translate DW_OP_WASM_location into an appropriate FP+offset combo. Somehow...

I realize that this is basically throwing the problem "over the fence", and asking the other side to deal with things, but I am starting to get sceptical that we will be able to come up with a satisfactory solution within lldb.

While that idea has occurred to me too, I am not convinced it is a good one:

  • it replaces one odd dependency with another one. Why should a Process need to know how to evaluate a DWARF expression? Or even that DWARF exists for that matter? This seems totally unrelated to what other Process functions are doing currently...

But it is what people do in reality. DW_OP_low_user and DW_OP_high_user are ranges that are made available to people to customize their DWARF opcodes. If you don't handle it, you are hosed and can't show a variable location. And to make things worse, two different compilers could both use the same value in that range. So they made DWARF expressions customizable with no real attempt to make them function for different architectures. that is unless you standardize it and make a real opcode that gets accepted into DWARF. The kind of DWARF location opcode that is being used here could easily be generalized into a DW_OP_get_stack_variable with a bunch of args, but at some point you have to talk to someone that is in communication with the runtime of the thing you are debugging to get the answer. So I do believe asking the process for this is not out of scope.

I think the "at some point" part is very important here. I get how dwarf expressions are meant to be extended, and that doing that is tricky, but I don't think that automatically means that we should delegate that to a different process. There are various ways that could be implemented and the delegation could be performed at different levels. For example the DWARFExpression class could be made into a class hierarchy and we could have a subclass of it for each architecture with funny operations. Then the subclass would have enough knowledge about wasm to properly parse the expression and evaluate it (possibly by making additional queries to someone else) -- this is sort of what the current patch does, without the "subclass" part.

The problem I have with a function like EvaluateCustomDWARFExpressionOpcode is that it is completely unlike anything else that our process class needs to deal with. The Process deals with threads (and how to control them), memory (read/write) and, to a limited degree, with modules. It knows nothing about "stack frames" or "variables" or "dwarf expressions" -- these are concepts built on top of that. This becomes even more true if we start to talk about the gdb-remote protocol instead of the lldb Process abstraction.

  • I am not sure it even completely removes wasm knowledge from e.g. DWARFExpression -- the class would presumably still need to know how to parse this opcode.

It is true and this is another hole in the "people can extend DWARF easily" scenario. We need to know what opcode arguments are and that would need to be hard coded for now. But it wouldn't have to rely on anything except virtual function on the generic lldb_private::Process/Thread APIs. In this case as soon as we get an unknown opcode we would need to pass the DataExtractor and the offset into it so the process could extract the arguments.

Not only that, we might need to pass in the entire DWARF stack, in case the opcode depends on some of the stack arguments.

Not clean, but better than making DWARFExpression depend on process plug-ins IMHO.

The dependence on a process plugin could be dealt with by making GetWasmGlobal/Local/etc a virtual function on the Process class. Also not clean, but it's not clear to me whether it's cleaner than having EvaluateCustomDWARFExpressionOpcode virtual function.

Anyway, just the fact that we can't come up with a "clean" solution doesn't mean that we should accept an "unclean" one. This wouldn't be the first feature that ends up sitting in a fork somewhere because it does not integrate cleanly with llvm (probably the largest example of that is swift-lldb).

And I believe the current problems are just the tip of the iceberg. I can't imagine what hoops we'll need to jump through once we start evaluating expressions...

  • the interface could get very complicated if we wanted to implement typed stacks present in DWARF5 -- presumably the API would need to return the type of the result, in addition to its value.

DWARF5 just further clarifies what each value on the opcode stack is (file address, load address, the value itself, etc). Right now DWARF expression just infer what a value is based on the opcodes. So I don't see a huge problem here as anything we do will need to work with DWARF5.

That doesn't sounds right. DWARF5 introduces opcodes like DW_OP_deref_type which takes a _die offset_ as an argument so that you can specify what is the type of the dereference result value you are accessing. Fortunately that offset must refer to a DW_TAG_base_type, which means the most interesting aspects are byte size and signedness (and byte size is sort of implied by the result), but that still leaves the door open to more languages with more complicated "base" types.

What if the wasm engine actually made some effort to present a more "conventional" view of the wasm "process"? I.e., in addition to the "implied" PC, we could have an "implied" SP (and maybe an FP too). Then it could lay out the call stack and the function arguments in "memory", and point the "SP" to it so that lldb is able to reconstruct the frames&variables using the "normal" algorithm. For the sake of exposition, lets assume the following "encoding" of the 64 bit memory space:

unsigned:16 type; // 0x5555 = stack
unsigned:16 tid; // are there threads in wasm?
unsigned:16 frame; // grows down, 0x0000 = bottommost frame (main), 0x0001 = second from bottom, etc.
unsigned:16 address;

Then the engine could say that the "SP" of thread 0x1234 is 0x5555123400050000, "FP" is `0x5555123400048010 and the have the memory contents be

0x5555123400048020: // third "local" (of frame 4)
0x5555123400048018: // second "local" (of frame 4)
0x5555123400048010: // first "local" (of frame 4)
0x5555123400048008: 0x5555123400038010 // previous FP (frame 3)
0x5555123400048000: ???? // previous PC (frame 3)
0x5555123400040010: // third "argument" (of frame 4)
0x5555123400040008: // second "argument" (of frame 4)
0x5555123400040000: // first "argument" (of frame 4)
0x5555123400038020: // third "local" (of frame 3)
0x5555123400038018: // second "local" (of frame 3)
0x5555123400038010: // first "local" (of frame 3)
0x5555123400038008: 0x5555123400028010 // previous FP (frame 2)
0x5555123400038000: ???? // previous PC (frame 2)
etc.

Then all it would be needed is to translate DW_OP_WASM_location into an appropriate FP+offset combo. Somehow...

I realize that this is basically throwing the problem "over the fence", and asking the other side to deal with things, but I am starting to get sceptical that we will be able to come up with a satisfactory solution within lldb.

When you say

translate DW_OP_WASM_location into an appropriate FP+offset combo.

do you mean that LLVM should generate these FP+offset combos rather than DW_OP_WASM_location or that LLDB should somehow do this translation?
I think the engine can do more to help, here, but not a lot more; I am afraid. Yes, it could expose an implied “SP” and “FP”, and that should be sufficient to represent locals and arguments and make stack walking more orthodox. But DW_OP_WASM_location also describes locations in the set of wasm globals and in the Wasm operand stack, so we would need at least a second. parallel stack to represent the operand stack.

Also, for C++ LLVM emits code to maintain a “shadow stack” in the linear memory of the module, and location expressions like DW_OP_fbreg +N are already used to describe the location of a parameter or a local variable in that shadow stack. The stack frame pointer for that function is described with DW_AT_frame_base, expressed as a DW_OP_WASM_location expression.

In the end walking the stack is not a big problem, its logic can already be encapsulated in a Unwind-derived plugin class. The issues are:

  • in DWARFExpression::Evaluate, where we need to handle DW_OP_WASM_location somehow, and
  • in Value::GetValueAsData, where we need to read from the memory of the current Wasm module, which is a space separated from the address space of code.

I understand that it is not easy to plug in this functionality in a very neat way, and maybe I am missing something else here, but if there are no other places involved maybe we can come up with a clean solution.

And I believe the current problems are just the tip of the iceberg. I can't imagine what hoops we'll need to jump through once we start evaluating expressions...

Expression evaluation works, in my prototype, for simple expressions. For complex expressions I see logged errors like this, in IRInterpreter::CanInterpret():

Unsupported instruction: %call = call float @_ZNK4Vec3IfE3dotERKS0_(%class.Vec3* %7, %class.Vec3* dereferenceable(12) %8)

It’s not clear to me if the problem is caused by the debug symbols or by the IR generated for Wasm… is there any doc where I could learn more about expression evaluation in LLDB? It’s a topic that really interests me, even outside the scope of this Wasm work.

ted added a subscriber: ted.Apr 29 2020, 10:25 AM

Hi Paulo,
@clayborg asked me to look at this, because I've worked with systems that have multiple address spaces. I was thinking, instead of a WebAssembly specific memory read, we should implement an optional generic memory read and write with memory space support.

So instead of qWasmMem:frame_index;addr;len, we have something like qMemSpaceRead:addr;space;len and qMemSpaceWrite:addr;space;len;data .

"space" is stub dependent - it's just a number, and can mean different things for different targets. It could be different modules, like you talk about here, or different physical memory areas like in a Motorola 56K DSP, or different ways to get at the same memory (physical/virtual/cacheable, or directly controlling MESI bits instead of relying on TLB entries) like I implemented in FSLDBG.

If we do this, other targets that want to work with memory spaces can use them instead of having to implement their own extensions.

About the expression problem - the IR Interpreter doesn't handle some complex expressions. It needs work, but most targets use JIT. On Hexagon, we only recently enabled JIT in the compiler, and haven't done it in the debugger yet, so we use the IR Interpreter for everything. Unfortunately I haven't had time to dive into it to make it better.

In D78801#2010536, @ted wrote:

Hi Paulo,
@clayborg asked me to look at this, because I've worked with systems that have multiple address spaces. I was thinking, instead of a WebAssembly specific memory read, we should implement an optional generic memory read and write with memory space support.

So instead of qWasmMem:frame_index;addr;len, we have something like qMemSpaceRead:addr;space;len and qMemSpaceWrite:addr;space;len;data .

"space" is stub dependent - it's just a number, and can mean different things for different targets. It could be different modules, like you talk about here, or different physical memory areas like in a Motorola 56K DSP, or different ways to get at the same memory (physical/virtual/cacheable, or directly controlling MESI bits instead of relying on TLB entries) like I implemented in FSLDBG.

If we do this, other targets that want to work with memory spaces can use them instead of having to implement their own extensions.

About the expression problem - the IR Interpreter doesn't handle some complex expressions. It needs work, but most targets use JIT. On Hexagon, we only recently enabled JIT in the compiler, and haven't done it in the debugger yet, so we use the IR Interpreter for everything. Unfortunately I haven't had time to dive into it to make it better.

Hi Ted,

I really like the idea of defining generic commands for reading and writing memory on systems with multiple address spaces! In fact there is no reason why that should be Wasm-specific.

Also, to eliminate qWasmCallStack we could maybe add the call stack (a list of PCs) to the stop packet. The format of a stop packet is:

T AA n1:r1;n2:r2;…

The program received signal number AA (a two-digit hexadecimal number). [...] Each ‘n:r’ pair is interpreted as follows:
If n is a hexadecimal number, it is a register number, and the corresponding r gives that register’s value. [...]
If n is ‘thread’, then r is the thread-id of the stopped thread, as specified in thread-id syntax.
If n is ‘core’, then r is the hexadecimal number of the core on which the stop event was detected.
Otherwise, GDB should ignore this ‘n:r’ pair and go on to the next; this allows us to extend the protocol in the future.

So adding a 'stack:xxxxx...xxx' pair should be possible. If we reuse m, M in place of qWasmLocal, qWasmGlobal and qWasmStackValue and add generic qMemSpaceRead/qMemSpaceWrite for the memory, then there would be no new Wasm-specific command to add to the protocol.

Sounds like we have an approach to try! I would like to see solutions that are not WASM specific when possible.

One other way of doing the memory read/write with segments: maybe the "m" and "M" packets can be overloaded include the memory segment identifier if the remote GDB server responds with "OK" to a QEnable packet. LLDB currently enables some features in the remote stub if the QEnable packet responds with "OK", like:

<  23> send packet: $QSetDetachOnError:1#f8
<   6> read packet: $OK#00

What if we added a new "QEnableMemorySegments" packet that then requires all memory reads/writes to include the memory segment in the packet? Without this or if the GDB server responds with "$#00" (unimplemented), the memory read packets look like:

m addr,length
M addr,length:XX…

If the GDB server responds with "OK" to the QEnableMemorySegments, then the packets become:

m addr,segment,length
M addr,segment,length:XX…

I am guessing we can't be the first to do segmented memory read/write via GDB server, so it would be good to look around to see what other may have done.

While that idea has occurred to me too, I am not convinced it is a good one:

  • it replaces one odd dependency with another one. Why should a Process need to know how to evaluate a DWARF expression? Or even that DWARF exists for that matter? This seems totally unrelated to what other Process functions are doing currently...

But it is what people do in reality. DW_OP_low_user and DW_OP_high_user are ranges that are made available to people to customize their DWARF opcodes. If you don't handle it, you are hosed and can't show a variable location. And to make things worse, two different compilers could both use the same value in that range. So they made DWARF expressions customizable with no real attempt to make them function for different architectures. that is unless you standardize it and make a real opcode that gets accepted into DWARF. The kind of DWARF location opcode that is being used here could easily be generalized into a DW_OP_get_stack_variable with a bunch of args, but at some point you have to talk to someone that is in communication with the runtime of the thing you are debugging to get the answer. So I do believe asking the process for this is not out of scope.

I think the "at some point" part is very important here. I get how dwarf expressions are meant to be extended, and that doing that is tricky, but I don't think that automatically means that we should delegate that to a different process. There are various ways that could be implemented and the delegation could be performed at different levels. For example the DWARFExpression class could be made into a class hierarchy and we could have a subclass of it for each architecture with funny operations. Then the subclass would have enough knowledge about wasm to properly parse the expression and evaluate it (possibly by making additional queries to someone else) -- this is sort of what the current patch does, without the "subclass" part.

The problem I have with a function like EvaluateCustomDWARFExpressionOpcode is that it is completely unlike anything else that our process class needs to deal with. The Process deals with threads (and how to control them), memory (read/write) and, to a limited degree, with modules. It knows nothing about "stack frames" or "variables" or "dwarf expressions" -- these are concepts built on top of that. This becomes even more true if we start to talk about the gdb-remote protocol instead of the lldb Process abstraction.

  • I am not sure it even completely removes wasm knowledge from e.g. DWARFExpression -- the class would presumably still need to know how to parse this opcode.

It is true and this is another hole in the "people can extend DWARF easily" scenario. We need to know what opcode arguments are and that would need to be hard coded for now. But it wouldn't have to rely on anything except virtual function on the generic lldb_private::Process/Thread APIs. In this case as soon as we get an unknown opcode we would need to pass the DataExtractor and the offset into it so the process could extract the arguments.

Not only that, we might need to pass in the entire DWARF stack, in case the opcode depends on some of the stack arguments.

Yes

Not clean, but better than making DWARFExpression depend on process plug-ins IMHO.

The dependence on a process plugin could be dealt with by making GetWasmGlobal/Local/etc a virtual function on the Process class. Also not clean, but it's not clear to me whether it's cleaner than having EvaluateCustomDWARFExpressionOpcode virtual function.

I would really like to see a solution that does not include "Wasm" in any process or thread virtual functions. I am fine with something like:

enum IndexedVariableType {
 Global,
 Local,
 Stack
};

Process::GetIndexedVariable(IndexedVariableType type, size_t index, ...)

I would rather see a DWARF function added to process over a WASM specific one.

Anyway, just the fact that we can't come up with a "clean" solution doesn't mean that we should accept an "unclean" one. This wouldn't be the first feature that ends up sitting in a fork somewhere because it does not integrate cleanly with llvm (probably the largest example of that is swift-lldb).

I think there is a clean way to do memory reading and writing with segment IDs and also to access variables from the process runtime.

Actually, that brings up an idea: a process runtime plug-in for languages that have a runtime that can be communicated with. Java has a runtime, Wasm has a runtime, most Javascript engines have runtimes. Maybe this wouldn't be too hard to abstract? Then we can add the GetGlobal(index), GetLocal(index), GetStack(index) to this plug-in?

And I believe the current problems are just the tip of the iceberg. I can't imagine what hoops we'll need to jump through once we start evaluating expressions...

  • the interface could get very complicated if we wanted to implement typed stacks present in DWARF5 -- presumably the API would need to return the type of the result, in addition to its value.

DWARF5 just further clarifies what each value on the opcode stack is (file address, load address, the value itself, etc). Right now DWARF expression just infer what a value is based on the opcodes. So I don't see a huge problem here as anything we do will need to work with DWARF5.

That doesn't sounds right. DWARF5 introduces opcodes like DW_OP_deref_type which takes a _die offset_ as an argument so that you can specify what is the type of the dereference result value you are accessing. Fortunately that offset must refer to a DW_TAG_base_type, which means the most interesting aspects are byte size and signedness (and byte size is sort of implied by the result), but that still leaves the door open to more languages with more complicated "base" types.

Sorry for the overly long post. I tried to reply to all messages from last night. I don't claim that all of my comments are consistent with one another -- that's a reflection of the fact that I really don't know what is the right solution...

Also, to eliminate qWasmCallStack we could maybe add the call stack (a list of PCs) to the stop packet. The format of a stop packet is:

T AA n1:r1;n2:r2;…

The program received signal number AA (a two-digit hexadecimal number). [...] Each ‘n:r’ pair is interpreted as follows:
If n is a hexadecimal number, it is a register number, and the corresponding r gives that register’s value. [...]
If n is ‘thread’, then r is the thread-id of the stopped thread, as specified in thread-id syntax.
If n is ‘core’, then r is the hexadecimal number of the core on which the stop event was detected.
Otherwise, GDB should ignore this ‘n:r’ pair and go on to the next; this allows us to extend the protocol in the future.

So adding a 'stack:xxxxx...xxx' pair should be possible. If we reuse m, M in place of qWasmLocal, qWasmGlobal and qWasmStackValue and add generic qMemSpaceRead/qMemSpaceWrite for the memory, then there would be no new Wasm-specific command to add to the protocol.

I don't see a real difference between these two options. The only reason the stack: key is not wasm-specific is because you excluded wasm from the name. If we renamed qWasmCallStack to qCallStack, the effect would be the same. For me the question is more fundamendal -- who/how/why should be computing the call stack, not the exact protocol details.

It may also be interesting to note that the stop-reply packet kind of also includes the call stack information via the memory field. The idea there is that the stub will walk the FP chain and send over the relevant bits of memory. However, there is a big difference -- all of this is just a hint to the client and an optimization to reduce packet count. It's still the client who determines the final call stack.

I really like the idea of defining generic commands for reading and writing memory on systems with multiple address spaces! In fact there is no reason why that should be Wasm-specific.

I know a lot of people are interested in adding address spaces to lldb. But I don't think the problem is adding a gdb-remote extension to the m packet -- that is the easy part. The same thing could be done to the "memory read" command in lldb. The interesting stuff begins when you want to go beyond that: If the address space is just an opaque number, then what does that mean? What can lldb do with such address spaces? How do they map to the address spaces in llvm? How do they relate to addresses in object files (no common format has support for them)? How about addresses in debug info? etc.

If dwarf had a notion of address spaces then there'd probably be no need for DW_OP_WASM_location. If the dwarf committee hasn't been able to come up with a unified way to represent address spaces, then I'm not sure we will do better. And we'll still be left with the job of translating dwarf (and other) entries into this other concept.

When you say

translate DW_OP_WASM_location into an appropriate FP+offset combo.

do you mean that LLVM should generate these FP+offset combos rather than DW_OP_WASM_location or that LLDB should somehow do this translation?

I meant the latter, though if we could somehow achieve the former, it would be even better, obviously. :)

I think the engine can do more to help, here, but not a lot more; I am afraid. Yes, it could expose an implied “SP” and “FP”, and that should be sufficient to represent locals and arguments and make stack walking more orthodox. But DW_OP_WASM_location also describes locations in the set of wasm globals and in the Wasm operand stack, so we would need at least a second. parallel stack to represent the operand stack.

Also, for C++ LLVM emits code to maintain a “shadow stack” in the linear memory of the module, and location expressions like DW_OP_fbreg +N are already used to describe the location of a parameter or a local variable in that shadow stack. The stack frame pointer for that function is described with DW_AT_frame_base, expressed as a DW_OP_WASM_location expression.

So, the pointer to this "shadow stack" is one of the function arguments, represented as DW_OP_WASM_location 0x0 (local) + constant. And then we get to a variable on the shadow stack by "dereferencing" this value, and adding another constant. Is that right?

That sound like it should be expressible in standard dwarf. If we set DW_AT_frame_base to DW_OP_regX FP, then "real" arguments could be expressed as DW_OP_fbreg +Y and "shadow" arguments as DW_OP_fbreg +index_of_shadow_argument, DW_OP_deref, DW_OP_const Y, DW_OP_add.
I guess the reason this hasn't been done is because that would give off the impression that all of these things are in memory, in the same address space, but the "real" arguments don't have a memory location at all? But if they don't have a memory location, is it right to represent them as address spaces?

In the end walking the stack is not a big problem, its logic can already be encapsulated in a Unwind-derived plugin class.

That is true, and there are elements of the current solution there that I like a lot. The thing I am resisting is to put all of this stuff in the the ProcessGDBRemote class. Instead of trying to generalize it so that it can handle everything (and generalize to the wrong thing), I very much like the idea of introducing a WasmProcess plugin class that handles all wasm stuff. If that class happens to use parts of gdb-remote, then so be it, but it means that it's not a problem for that class to use a dialect of the gdb-remote protocol, which includes as many wasm-specific packets as it needs. Then this class can create it's own implementation of the "Unwind" interface, which will use some WasmProcess-specific apis to undwind, but that will also be ok, since both classes will be wasm-specific.

The question is whether something similar can be done for the other two cases. I believe it might be possible for the DWARFExpression. We just need to have a way to separate the creation of the dwarf expression data from the process of evaluating it. Right now, both of these things happens in SymbolFileDWARF -- it creates a DWARFExpression object which both holds the data and knows how to evaluate it. But I don't believe SymbolFileDWARF needs the second part. If we could make it so that something else is responsible for creating the evaluator for dwarf expressions, that something could create a WasmDWARFExpression which would know about DW_OP_WASM_location and WasmProcess, and could evaluate it.

The GetValueAsData problem is trickier, particularly as I'm not even sure the current implementation is correct. Are you sure that you really need the _current_ module there? What happens if I use "target variable" to display a variable from a different module? What if I then dereference that variable? If the dereference should happen in the context of the other module, then I guess the "module" should be a property of the value, not of the current execution context. And it sounds like some address space-y thing would help. But that might require teaching a lot of places in lldb about address spaces, in particular that a dereferencing a value in one address space, should produce a value in the same address space (at least for "near" pointer or something).
If we should really use the current frame as the context, then I guess we'd need some sort of a interfaces to ask a stack frame to get the value of a "Value".

And I believe the current problems are just the tip of the iceberg. I can't imagine what hoops we'll need to jump through once we start evaluating expressions...

Expression evaluation works, in my prototype, for simple expressions. For complex expressions I see logged errors like this, in IRInterpreter::CanInterpret():

Unsupported instruction: %call = call float @_ZNK4Vec3IfE3dotERKS0_(%class.Vec3* %7, %class.Vec3* dereferenceable(12) %8)

It’s not clear to me if the problem is caused by the debug symbols or by the IR generated for Wasm… is there any doc where I could learn more about expression evaluation in LLDB? It’s a topic that really interests me, even outside the scope of this Wasm work.

Yes, this is because complex expressions require us to inject code into the inferior to run it. I expect that to be quite a tough nut to crack. Even so, I do find it pretty impressive that simple expressions to work.

I am not aware of any good documentation for the expression evaluator. Probably the closest thing are some devmtg tutorials.

Not only that, we might need to pass in the entire DWARF stack, in case the opcode depends on some of the stack arguments.

Yes

In principle, I am fine with having a "Process" (or someone else -- we may want to do this for not-yet-started processes a'la "target variable" too) specifying the semantics of a dwarf expression. Though it that case, I think it would be cleaner to just have this entity provide a "DWARFEvaluator" object, which will handle the entirety of the evaluation. The fact that 99% of the different evaluators will be identical can be handled by putting that code in a common base class.

I would really like to see a solution that does not include "Wasm" in any process or thread virtual functions. I am fine with something like:

enum IndexedVariableType {
 Global,
 Local,
 Stack
};

Process::GetIndexedVariable(IndexedVariableType type, size_t index, ...)

The thing I fear with such a solution is that it will be wasm-specific in everything but the name. It's very hard to create a good generic interface with just a few (one?) data points. There are plenty of examples for that in lldb, where an API tries to look very generic, but in reality it only makes sense for darwin/macho/objc...

Actually, that brings up an idea: a process runtime plug-in for languages that have a runtime that can be communicated with. Java has a runtime, Wasm has a runtime, most Javascript engines have runtimes. Maybe this wouldn't be too hard to abstract? Then we can add the GetGlobal(index), GetLocal(index), GetStack(index) to this plug-in?

I think that is a very interesting idea to explore. We already have language runtime plugins, and they do have the ability to refine the type of a variable/value. Maybe they could also somehow help with computing the value of a variable? Then instead of a GetGlobal/Local(index), we could have GetValue(Variable), and they would be the ones responsible for making sense of the dwarf expressions and address spaces?

The thing I am resisting is to put all of this stuff in the the ProcessGDBRemote class. Instead of trying to generalize it so that it can handle everything (and generalize to the wrong thing), I very much like the idea of introducing a WasmProcess plugin class that handles all wasm stuff. If that class happens to use parts of gdb-remote, then so be it, but it means that it's not a problem for that class to use a dialect of the gdb-remote protocol, which includes as many wasm-specific packets as it needs. Then this class can create it's own implementation of the "Unwind" interface, which will use some WasmProcess-specific apis to undwind, but that will also be ok, since both classes will be wasm-specific.

I think that this would solve a lot of problems. WasmProcess could inherit from GDBRemoteProcess and could send itself Wasm-specific commands like qWasmMem just by calling GetGDBRemote().SendPacketAndWaitForResponse() Then there would be no changes to ProcessGDBRemote at all.
For walking the call stack I would then keep the current design with the Unwind-derived class that sends a command to get the call stack from the stub. It’s true that normally it is the client that calculates call stacks, but it does so even because it cannot really ask them to the stub, but here we have a runtime that can provide this information.

If dwarf had a notion of address spaces then there'd probably be no need for DW_OP_WASM_location. If the dwarf committee hasn't been able to come up with a unified way to represent address spaces, then I'm not sure we will do better. And we'll still be left with the job of translating dwarf (and other) entries into this other concept.

Yes, it is possible to come up with some representation of a unified address space for Wasm locals, globals and stack items (and also code and memory). But it's also true that locals, globals and stack items don’t really have a memory location. The reason for the introduction of DW_OP_WASM_location is indeed because there was nothing in DWARF that could well represent these entities. While there are certainly other architectures with multiple memory spaces, the execution model of WebAssembly is quite peculiar in this aspect, I think.

The question is whether something similar can be done for the other two cases. I believe it might be possible for the DWARFExpression. We just need to have a way to separate the creation of the dwarf expression data from the process of evaluating it. Right now, both of these things happens in SymbolFileDWARF -- it creates a DWARFExpression object which both holds the data and knows how to evaluate it. But I don't believe SymbolFileDWARF needs the second part. If we could make it so that something else is responsible for creating the evaluator for dwarf expressions, that something could create a WasmDWARFExpression which would know about DW_OP_WASM_location and WasmProcess, and could evaluate it.

Separating the DWARFExpression data and the logic to evaluate it would certainly make sense. I think that this must be a general problem: since DWARF provides the way to define custom expression location, different architectures might define specific DWARF codes that they need to handle, so it would be great if we had a DWARFEvaluator that was also pluggable. I don’t know how complex would be to refactor DWARFExpression in this way but I can investigate.

The GetValueAsData problem is trickier, particularly as I'm not even sure the current implementation is correct. Are you sure that you really need the _current_ module there? What happens if I use "target variable" to display a variable from a different module? What if I then dereference that variable? If the dereference should happen in the context of the other module, then I guess the "module" should be a property of the value, not of the current execution context. And it sounds like some address space-y thing would help. But that might require teaching a lot of places in lldb about address spaces, in particular that a dereferencing a value in one address space, should produce a value in the same address space (at least for "near" pointer or something).
If we should really use the current frame as the context, then I guess we'd need some sort of a interfaces to ask a stack frame to get the value of a "Value".

The fact that the code address space is separated from the memory address space is really what makes things complicated. However, we know for sure that every time that all memory reads made while evaluating DWARF expressions or variables always target the module memory space, never the code space.

I must confess that had not really tested target variable so far; I did it today and I found that it already almost works. What happens is that the location of the global variable is calculated with DWARFExpression::Evaluate (in my tests I only see DW_OP_addr, but maybe there could be other ways?) and there it calls Value::ConvertToLoadAddress() passing the correct module, and this produces an address in the form module_id|offset, which is then used in Value::GetValueAsData(), which currently sends qWasmMem requests.

We don’t really need the frame_index in Value:: GetValueAsData; the stub only uses frame_index to calculate the corresponding module_id, and we already pass the current Module* to this function.
So it seems possible to make this work for WebAssembly, but the tricky part is how to make Wasm-specific reads in a class like Value that should not have any knowledge about Wasm.

Actually, that brings up an idea: a process runtime plug-in for languages that have a runtime that can be communicated with. Java has a runtime, Wasm has a runtime, most Javascript engines have runtimes. Maybe this wouldn't be too hard to abstract? Then we can add the GetGlobal(index), GetLocal(index), GetStack(index) to this plug-in?

I think that is a very interesting idea to explore. We already have language runtime plugins, and they do have the ability to refine the type of a variable/value. Maybe they could also somehow help with computing the value of a variable? Then instead of a GetGlobal/Local(index), we could have GetValue(Variable), and they would be the ones responsible for making sense of the dwarf expressions and address spaces?

Summarizing, if we assume that we can create:

  • WasmProcess, derived from GDBRemoteProcess,
  • WasmUnwind, derived from Unwind, and
  • WasmDWARFEvaluator, derived from a new class DWARFEvaluator,

then we would not have to touch any existing GDBRemote- and DWARFExpression code.
At this point the way we query locals/globals/stack and the call stack from the Wasm engine would be just an implementation detail of these Wasm plugin classes (assuming that we can define new gdb-remote custom commands like qWasmLocal, qWasmGlobal, qWasmStackValue, qWasmCallStack. Then we would not really need to abstract this functionality with a generic interface, in my opinion.

What is left to decide would be “just” how to handle memory. In particular how to make Wasm-specific memory requests, targeting a particular Wasm module, from class Value.

paolosev updated this revision to Diff 261780.EditedMay 4 2020, 5:11 AM
paolosev retitled this revision from [LLDB] Add class ProcessWasm for WebAssembly debugging to [LLDB] Add class WasmProcess for WebAssembly debugging.
paolosev edited the summary of this revision. (Show Details)

This patch is a work in progress where I am refactoring the code, as suggested, to remove almost all dependencies on Wasm from the LLDB core that were present in the previous patch. In particular:

  • The logic to evaluate a DWARF expression is now separated from the DWARF expression data. There is a new kind of plugin, DWARFEvaluator, that can be used to define platform-specific evaluators. Base class DWARFEvaluator contains all the evaluation code extracted from class DWARFExpression.
  • Plugin class WasmDWARFEvaluation takes care of evaluating WASM-specific codes like DW_OP_WASM_location.
  • Process-plugin class WasmProcess provides functions to access the Wasm engine state through a GDB-remote connection. Now WasmProcess contains all the logic to send Wasm-specific queries to the GDB stub; there are no more changes to existing classes like GDBRemoteClientConnection.
  • Like before, class UnwindWasm handles stack unwinding by requesting the call stack from the Wasm engine.

In this way, the patch does not impose (almost) any knowledge of WebAssembly on the LLDB core, besides the logic to register Wasm-specific plugin classes. The only exception still left is in class Value where in this patch there is still a dependency on WasmProcess that I need to remove.

Please, let me know if this is a step in the right direction, in your opinion.
(I realize that this patch has become annoyingly large, but of course once we understand what could be a good solution I will split it in more manageable pieces).

Interesting approach to DWARF expression evaluation, though it might be simpler to leave the DWARFExpression as it is, and have the plug-in part only handle unknown opcodes. Right now if you want to add just one opcode, you must subclass and make a plug-in instance where 99% of opcodes get forwarded to the original implementation and the one new opcode gets handled by the plug-in. What if the DWARFExpression class was passed to a plug-in that only handles unknown opcodes? Might be fewer code changes and be a bit cleaner. The plug-in would handle only the DW_OP_WASM_location and would still be found/detected if and only if one is encountered?

As for unwinding, I still don't think we need the UnwindWASM class. See my inline comment in "Unwind &Thread::GetUnwinder()" for a bit of reasoning. It is very common for runtimes for languages to support supplying the stack frames for a thread, so this should be built into the lldb_private::Process/lldb_private::Thread classes. For stack unwindind I would suggest adding functions to lldb_private::Thread:

class Thread {
  /// Check if the runtime supports unwinding call stacks and return true if so.
  ///
  /// If the language runs in a runtime that knows how to unwind the call stack
  /// for a thread, then this function should return true.
  ///
  /// If true is returned, unwinding will use a RuntimeUnwind class that will call
  /// into this class' Thread::GetRuntimeFrame* functions to do the unwinding.
  /// If false is returned, the standard UnwindLLDB will be used where unwinding
  /// will be done using registers, unwind info and other debug info.
  virtual bool HasRuntimeUnwindSupport() {
    return false; // Default to using UnwindLLDB()
  }
  virtual uint32_t GetRuntimeFrameCount() {
    return 0;
 }
 virtual bool GetRuntimeFrameInfoAtIndex(uint32_t frame_idx, lldb::addr_t &cfa, lldb::addr_t &pc, bool &behaves_like_zeroth_frame) {
  return false;
 }
 virtual lldb::RegisterContextSP GetRuntimeFrameRegisterContext(uint32_t frame_idx) {
  return lldb::RegisterContextSP();
 }

Then either ThreadGDBRemote, or possibly a subclass like ThreadWasm will implement these virtual functions.

For the GetWasmLocal, GetWasmGlobal, GetWasmStackValue, I still think abstracting this into the lldb_private::Process/lldb_private::Thread is the right way to do this. The way you have this right now, there is not way to tell how big the buffer needs to be to fetch any of these local/global/stack values. They all assume 16 bytes is enough.

If we have a runtime that knows about information in a stack frame, function or global, then we should have a way to fetch that from a process/thread. For example:

class Process {
  /// Fetch a global variable from the process runtime if this is supported.
  ///
  /// \param var_id A 64 bit value that needs to encode all of the data needed to fetch a
  ///              global variable and should uniquely identify a global variable in a module.
  /// \param buf A buffer that will get the bytes from the global variable. If buf is NULL, then
  ///             size will be returned so the client knows how large the variable is.
  /// \param size The size of the buffer pointed to by "buf". If buf is NULL, this value can be 
  ///              zero to indicate the function should return the actual size of the global variable.
  ///
 /// \returns The actual size of the variable which might be larger that the "size" parameter.
  virtual size_t GetRuntimeGlobalValue(lldb::user_id_t var_id, void *buf, size_t buffer_size) {
    return 0;
  }

We would need to do something similar on the Thread class for locals and stack values:

class Thread {
  virtual size_t GetRuntimeLocalValue(uint32_t frame_idx, lldb::user_id_t var_id, void *buf, size_t buffer_size) {
    return 0;
  }
  virtual size_t GetRuntimeStackValue(uint32_t frame_idx, lldb::user_id_t var_id, void *buf, size_t buffer_size) {
    return 0;
  }
lldb/source/Expression/DWARFEvaluator.cpp
328–330

This shouldn't be in here. Remove DW_OP_WASM_location as it is custom.

lldb/source/Plugins/Process/wasm/WasmProcess.cpp
71–76 ↗(On Diff #261780)

This seems flaky to me. How are you ever going to get the frame index right unless we encode it into the "vm_addr" using bitfields like we spoke about before. And if we encode it all into the 64 bit address, then we don't need this special read. Seems like we need to figure out if we are going to encode everything into an uint64_t or not. That will be the easiest way to integrate this into LLDB as all memory reads take a "lldb::addr_t" right now (no memory space information). We would change ReadMemory and WriteMemory to start taking a more complex type like:

AddressSpecifier {
  lldb::addr_t addr;
  uint64_t segment;
};

But that will be a huge change to the LLDB source code that should be done in a separate patch before we do anything here.

185 ↗(On Diff #261780)

This API seems wrong to be on the process. It should be on the ThreadWasm class (if we end up with one). see my main comments for more details.

lldb/source/Plugins/Process/wasm/WasmProcess.h
20 ↗(On Diff #261780)

Should be named ProcessWasm to follow all other process classes (if we decide we need to specialize a Wasm process class and not just abstract it into ProcessGDBRRemote).

lldb/source/Target/Thread.cpp
1857–1863 ↗(On Diff #261780)

If the call stack is available through the Process/Thread interface, I think we should be asking the thread for this information. So this code could be:

if (!m_unwinder_up) {
    if (HasRuntimeUnwindSupport())
      m_unwinder_up.reset(new RuntimeUnwind(*this));
    else
      m_unwinder_up.reset(new UnwindLLDB(*this));
  }

RuntimeUnwind would be a class that will fetch the stack frame information through the new virtual functions on the Thread class, but only if the virtual Thread::HasRuntimeUnwindSupport() returns true. As new languages and runtimes are added in the future I don't want to see this function look like:

if (!m_unwinder_up) {
    if (CalculateTarget()->GetArchitecture().GetMachine() ==
        llvm::Triple::wasm32)
      m_unwinder_up.reset(new wasm::UnwindWasm(*this));
    else if (CalculateTarget()->GetArchitecture().GetMachine() ==
        llvm::Triple::Rust)
      m_unwinder_up.reset(new rust::UnwindRust(*this));
    else if (CalculateTarget()->GetArchitecture().GetMachine() ==
        llvm::Triple::Go)
      m_unwinder_up.reset(new go::UnwindGo(*this));
    else
      m_unwinder_up.reset(new UnwindLLDB(*this));
  }

So we should find a way to generalize the stack frames being fetched from the process/thread classes using virtual functions.

I know this is the way GDB was built (many ifdefs and arch specific detecting code everywhere), but we have plug-ins in LLDB that are there to abstract us from this kind of code.

labath added a comment.May 5 2020, 4:41 AM

Interesting approach to DWARF expression evaluation, though it might be simpler to leave the DWARFExpression as it is, and have the plug-in part only handle unknown opcodes. Right now if you want to add just one opcode, you must subclass and make a plug-in instance where 99% of opcodes get forwarded to the original implementation and the one new opcode gets handled by the plug-in. What if the DWARFExpression class was passed to a plug-in that only handles unknown opcodes? Might be fewer code changes and be a bit cleaner. The plug-in would handle only the DW_OP_WASM_location and would still be found/detected if and only if one is encountered?

I think that the main reason this is awkward is that the evaluator is plugged in at the level of a single dwarf operation. That requires passing around of a lot of state. If it was plugged in at the level of evaluating the entire expression, then the amount of state to pass is much smaller. Obviously, the evaluation itself would then need to be factored into multiple functions so that one can override just the evaluation of a single expression, but that's pretty standard software engineering work, and something that we probably should do for code health anyway.

In fact, I believe that if we do this right then the result could be much cleaner that the current situation. Going off of the idea of caching the evaluator in the module, what if we don't "cache" the evaluator itself, but actually a factory (function) for it. The advantage of that (the factory function creating a evaluator instance for a specific expression) is that we could store a lot of the evaluation state in the evaluator object, instead of passing it all around through function arguments (I find the long function argument lists to be one of the main problems of the current DWARFExpression class).

As for unwinding, I still don't think we need the UnwindWASM class. See my inline comment in "Unwind &Thread::GetUnwinder()" for a bit of reasoning. It is very common for runtimes for languages to support supplying the stack frames for a thread, so this should be built into the lldb_private::Process/lldb_private::Thread classes. For stack unwindind I would suggest adding functions to lldb_private::Thread:

class Thread {
  /// Check if the runtime supports unwinding call stacks and return true if so.
  ///
  /// If the language runs in a runtime that knows how to unwind the call stack
  /// for a thread, then this function should return true.
  ///
  /// If true is returned, unwinding will use a RuntimeUnwind class that will call
  /// into this class' Thread::GetRuntimeFrame* functions to do the unwinding.
  /// If false is returned, the standard UnwindLLDB will be used where unwinding
  /// will be done using registers, unwind info and other debug info.
  virtual bool HasRuntimeUnwindSupport() {
    return false; // Default to using UnwindLLDB()
  }
  virtual uint32_t GetRuntimeFrameCount() {
    return 0;
 }
 virtual bool GetRuntimeFrameInfoAtIndex(uint32_t frame_idx, lldb::addr_t &cfa, lldb::addr_t &pc, bool &behaves_like_zeroth_frame) {
  return false;
 }
 virtual lldb::RegisterContextSP GetRuntimeFrameRegisterContext(uint32_t frame_idx) {
  return lldb::RegisterContextSP();
 }

Then either ThreadGDBRemote, or possibly a subclass like ThreadWasm will implement these virtual functions.

If we have ThreadWasm then I believe we don't need any of this as ThreadWasm can just override the appropriate function which returns the unwinder object.

For the GetWasmLocal, GetWasmGlobal, GetWasmStackValue, I still think abstracting this into the lldb_private::Process/lldb_private::Thread is the right way to do this. The way you have this right now, there is not way to tell how big the buffer needs to be to fetch any of these local/global/stack values. They all assume 16 bytes is enough.

For me, the main one of the advantages of having a wasm-specific class is that it can make wasm-specific assumptions. That said, the apis in question are definitely very c-like and could definitely be brought into the c++14 world.

lldb/source/Interpreter/CommandInterpreter.cpp
738–755

One way to improve this would be to have lldb detect the kind of plugin that it is talking to and create an appropriate instance based on that. However, that would require more refactorings, so this is good for a start anyway.

lldb/source/Plugins/Process/wasm/WasmProcess.cpp
185 ↗(On Diff #261780)

I guess that's because Paolo was avoiding (or not being able to) create ThreadWasm objects. If we make that possible (per my other comment) then this should be doable as well.

lldb/source/Target/Thread.cpp
1857–1863 ↗(On Diff #261780)

The way I would imagine this happening is that ProcessWasm overrides the appropriate method (if there isn't one we can create it) so that it creates ThreadWasms instead of plain ThreadGdbRemotes. Then ThreadWasm can override GetUnwinder to return an UnwindWasm instance.

labath added a comment.May 5 2020, 4:43 AM

Please, let me know if this is a step in the right direction, in your opinion.

I believe it is. I think that creating ThreadWasm objects would make it even better (and address a lot of Greg's issues). Unfortunately, I still don't know what to do about the whole Value business...

labath added a comment.May 6 2020, 7:44 AM

I forgot I wanted to respond to this part.

The fact that the code address space is separated from the memory address space is really what makes things complicated. However, we know for sure that every time that all memory reads made while evaluating DWARF expressions or variables always target the module memory space, never the code space.

I must confess that had not really tested target variable so far; I did it today and I found that it already almost works. What happens is that the location of the global variable is calculated with DWARFExpression::Evaluate (in my tests I only see DW_OP_addr, but maybe there could be other ways?) and there it calls Value::ConvertToLoadAddress() passing the correct module, and this produces an address in the form module_id|offset, which is then used in Value::GetValueAsData(), which currently sends qWasmMem requests.

Ok, I can see how that would sort of work. But what about dereferencing global variables (target variable *global_ptr)? I have a feeling that will be trickier because that address doesn't go through file->load address conversion, as it's expected to already be a load address. I don't think that we will automatically infer the right "module" for it, and I'm not sure how to make lldb infer the right module/address space there without teaching it a lot more about non-flat address spaces.

paolosev updated this revision to Diff 262824.May 8 2020, 1:12 AM
paolosev marked 8 inline comments as done.
paolosev edited the summary of this revision. (Show Details)
  • For the DWARF expression evaluation, I am here following the suggestion of defining a Factory plugin (DWARFEvaluatorFactory) which is cached by Module. Is there a way to define a simple plugin function, avoiding all the overhead of a plugin class? Most of the state is in DWARFExpression, which is passed to the evaluator, so the amount of state to pass around is much smaller.

Note that WasmDWARFEvaluator::Evaluate() not only needs to handle Wasm-specific opcodes like DW_OP_WASM_location, but might also handle in a Wasm-specific way some "standard" opcodes. For example, static variables are generally encoded with DW_OP_addr, but for WebAssembly the corresponding offset is the offset in the module Data section, which is mapped into an area of the Memory section when the module is loaded. So, memory reads related to DW_OP_addr refer to a different space and should be dealt differently by the runtime.
This makes target variable *global_ptr work when we have multiple modules., because we can retrieve the module_id from the Module associated to the DWARFExpression.

  • Wasm addresses may refer to different address spaces, and are internally encoded with 64 bits as:
63 61           32            0
+-+-------------+-------------+
|T|  module_id  |   offset    |
+-+-------------+-------------+

where T is 0:Code, 1:Data, 2:Memory.

  • I introduced a class ThreadWasm, as suggested, which overrides the function to create the unwinder object. This required a small change also to ProcessGDBRemote, a new virtual factory method to create a thread.
lldb/source/Plugins/Process/wasm/WasmProcess.cpp
71–76 ↗(On Diff #261780)

I look forward to implementing more cleanly in the future with AddressSpecifiers in ReadMemory and WriteMemory; for the moment I have implemented this with bitfields as suggested:

enum WasmAddressType {
  Code = 0x00,
  Data = 0x01,
  Memory = 0x02,
  Invalid = 0x03
};

struct wasm_addr_t {
  uint64_t type : 2;
  uint64_t module_id : 30;
  uint64_t offset : 32;
};

I still need to override ReadMemory here because it is not always possible to generate a full wasm_addr_t without making too many changes. Sometimes, for example in ValueObjectChild::UpdateValue -> ValueObject::GetPointerValue the existing code generates addresses without the knowledge of module_ids. But this is not a big problem because all these reads always refer to the Wasm Memory and the module_id can easily be retrieved from the current execution context. This is always true because a Wasm module can never read/write the memory of another module in the same process, at most Memories can be shared, but this is transparent to the debugger.

However, this requires a small changes to ReadMemory: I would introduce ExecutionContext *exe_ctx as an optional final parameter, null by default, passed by Value and ValueObject, ignored by all other process classes and utilized by ProcessWasm to deduce the module.

Hello @clayborg , @labath,
Any thoughts on this latest patch? :-)

Hello @clayborg , @labath,
Any thoughts on this latest patch? :-)

Sorry about the delay. I think that in terms of the design we've come as far as we can without making substantial changes to a lot of lldb interfaces. The question on my mind is.. is that enough?

Here, I am mainly thinking about the introduction of the ExecutionContext argument to ReadMemory. In a universe with static flat address spaces, the argument looks flat out wrong. If one thinks of "memory" as something more dynamic, it does not seem to be that bad, as it can be viewed as the context in which to interpret the memory addresses. However, I am having trouble assigning a semantic to it besides saying "it does what webassembly needs". Maybe it's just because I live in a flat universe and lack address space intuition...

Anyway, I think it would be good to get more people's opinions on this. For start, I nominate Jim. :)

The problem with "looking forward to implementing more cleanly in the future with AddressSpecifiers" is that _everyone_ is looking forward to having address spaces, but noone is actually working on implementing them. And I want to be careful about accumulating technical debt like this upstream, because it's the technical debt which makes future implementations hard.

Note that WasmDWARFEvaluator::Evaluate() not only needs to handle Wasm-specific opcodes like DW_OP_WASM_location, but might also handle in a Wasm-specific way some "standard" opcodes. For example, static variables are generally encoded with DW_OP_addr, but for WebAssembly the corresponding offset is the offset in the module Data section, which is mapped into an area of the Memory section when the module is loaded. So, memory reads related to DW_OP_addr refer to a different space and should be dealt differently by the runtime.
This makes target variable *global_ptr work when we have multiple modules., because we can retrieve the module_id from the Module associated to the DWARFExpression.

I am confused by this line of reasoning. For a global variable, the DW_OP_addr describes the location of the variable itself (the value of &global_ptr). That is what makes it possible to display the value of the pointer (global_ptr). Displaying the value of the pointed-to object (*global_ptr) is a different thing, and AFAIK it completely bypasses any dwarf expressions. Are you saying that the value of the pointer somehow inherits the "address space" of the memory where the pointer itself is located?

jingham added a comment.EditedMay 29 2020, 11:47 AM

Hello @clayborg , @labath,
Any thoughts on this latest patch? :-)

Sorry about the delay. I think that in terms of the design we've come as far as we can without making substantial changes to a lot of lldb interfaces. The question on my mind is.. is that enough?

Here, I am mainly thinking about the introduction of the ExecutionContext argument to ReadMemory. In a universe with static flat address spaces, the argument looks flat out wrong. If one thinks of "memory" as something more dynamic, it does not seem to be that bad, as it can be viewed as the context in which to interpret the memory addresses. However, I am having trouble assigning a semantic to it besides saying "it does what webassembly needs". Maybe it's just because I live in a flat universe and lack address space intuition...

Anyway, I think it would be good to get more people's opinions on this. For start, I nominate Jim. :)

The problem with "looking forward to implementing more cleanly in the future with AddressSpecifiers" is that _everyone_ is looking forward to having address spaces, but noone is actually working on implementing them. And I want to be careful about accumulating technical debt like this upstream, because it's the technical debt which makes future implementations hard.

I wasn't following very closely, sorry...

Before getting to far along, I'd just like to toss out something...

I always thought it was a shame that we ever introduced Process::ReadMemory(addr_t addr, ...) We already have a nice Address object. Seems to me the only reason we don't use everywhere is so we can do math on addresses more simply. For instance, it seems like the Address class is the proper place to pass along Address Space information. After all, when you make the address you are planning on fetching data from, you have to know what address space you were targeting, so you could naturally encode it at that point.

What about replacing ProcessReadMemory(addr_t addr, ...) with ProcessReadMemory(Address addr, ...), or even banning the use of lldb::addr_t on everywhere except in the bowels of Process subclasses and as an interchange for getting addresses as text from users. You might want to store lldb::addr_t's for Symbol values for space concerns, but presumably the Symbol File would know the relevant address space, so it would fix that up and always hand out Addresses.

This would be a pretty big but mostly formal change. It would seem more natural, Address is our abstraction for addresses in a process. Instead we have this odd mix of API's that take lldb::addr_t and some that take Address.

Note that WasmDWARFEvaluator::Evaluate() not only needs to handle Wasm-specific opcodes like DW_OP_WASM_location, but might also handle in a Wasm-specific way some "standard" opcodes. For example, static variables are generally encoded with DW_OP_addr, but for WebAssembly the corresponding offset is the offset in the module Data section, which is mapped into an area of the Memory section when the module is loaded. So, memory reads related to DW_OP_addr refer to a different space and should be dealt differently by the runtime.
This makes target variable *global_ptr work when we have multiple modules., because we can retrieve the module_id from the Module associated to the DWARFExpression.

I am confused by this line of reasoning. For a global variable, the DW_OP_addr describes the location of the variable itself (the value of &global_ptr). That is what makes it possible to display the value of the pointer (global_ptr). Displaying the value of the pointed-to object (*global_ptr) is a different thing, and AFAIK it completely bypasses any dwarf expressions. Are you saying that the value of the pointer somehow inherits the "address space" of the memory where the pointer itself is located?

What about replacing ProcessReadMemory(addr_t addr, ...) with ProcessReadMemory(Address addr, ...), or even banning the use of lldb::addr_t on everywhere except in the bowels of Process subclasses and as an interchange for getting addresses as text from users. You might want to store lldb::addr_t's for Symbol values for space concerns, but presumably the Symbol File would know the relevant address space, so it would fix that up and always hand out Addresses.

This would be a pretty big but mostly formal change. It would seem more natural, Address is our abstraction for addresses in a process. Instead we have this odd mix of API's that take lldb::addr_t and some that take Address.

Before all, I really apologize for the huge delay of this reply to your last comments. I had to focus on different tasks in the last couple of weeks and I needed to find some time to understand how Process::ReadMemory (and WriteMemory) could be modified to work with Address objects and not with addr_t.
It is certainly possible, but a problem, IMHO, is that class Address can be several things:

  • An offset in a section in a file
  • An offset in a loaded section
  • An absolute address in memory

If we modify ReadMemory to be:

Process::ReadMemory(const Address& address, void *buf, size_t size, Status &error)

then there we need to convert this Address into a concrete addr_t, to actually read from the MemoryCache (MemoryCache::Read) or from the debuggee (ReadMemoryFromInferior).
But how do we do this conversion? If we are dealing with a file we should use lldb::addr_t Address::GetFileAddress(). Otherwise we should call lldb::addr_t Address::GetLoadAddress(Target *target) I don't know if we have enough information in Process::ReadMemory() to always be able to correctly differentiate between the two cases. And in the latter case we should rely on Process::GetTarget() returning the valid target to pass to Address::GetLoadAddress.

The question then is: should Address know if it needs to be interpreted as a file address or load address? Should we have an AddressType field in Address?
We already have this enum that maybe we could reuse:

enum AddressType {
  eAddressTypeInvalid = 0,
  eAddressTypeFile, /// Address is an address as found in an object or symbol file
  eAddressTypeLoad, /// Address is an address as in the current target inferior process
  eAddressTypeHost  /// Address is an address in the process that is running code
};

Process::ReadMemory is called by many places in the LLDB code. In some of them, like Value::GetValueAsData, DWARFExpression::Evaluate, ValueObject::GetPointeeData, it should be fairly simple to construct the Address to read from. (And that should solve the problems for Wasm).
But in other places it is not so clear what we should do. Some code works with absolute addresses and does not need Sections (see table below). In these cases, we could rely on the Address constructor that accepts just an absolute addr_t as argument, and sets m_section_wp as null:

Address(lldb::addr_t abs_addr);

The existence of this constructor already makes most of the LLDB compile even after changing function ReadMemory to use an Address as argument, in class Process and in Process-derived classes. But then class Address would be in many cases just a wrapper over an addr_t. It seems to me that would go against the idea of using just Address as our abstraction for all addresses in a process.

This table describes where Process::ReadMemory is called and how the addresses we pass to ReadMemory originate:

ABI\AArch64\ABIMacOSX_arm64.cppLoadValueFromConsecutiveGPRRegistersaddr_t from reg_ctx
ABI\AArch64\ABISysV_arm64.cppLoadValueFromConsecutiveGPRRegistersaddr_t from RegisterContext
ABI\ARM\ABISysV_arm.cppGetReturnValuePassedInMemoryaddr_t from RegisterContext
ABI\PowerPC\ABISysV_ppc64.cppGetStructValueObjectaddr_t from Register
Commands\CommandObjectMemory.cppCommandObjectMemoryFind::ProcessMemoryIterator:: operator[]addr_t from m_base_data_address
DynamicLoader\MacOSX-DYLD\DynamicLoaderMacOSXDYLD.cppDoInitialImageFetchaddr_t from Process::GetImageInfoAddress
Expression\IRExecutionUnit.cppIRExecutionUnit::DisassembleFunctionaddr_t from JittedFunction::m_remote_addr
Expression\IRMemoryMap.cppIRMemoryMap::ReadMemory(addr_t process_address)addr_t passed as argument
JITLoader\GDB\JITLoaderGDB.cppReadJITEntryaddr_t passed as argument
Language\CPlusPlus\LibCxx.cppformatters::LibCxxMapIteratorSyntheticFrontEnd::Updateaddr_t from ValueObject::GetValueAsUnsigned()
Language\CPlusPlus\LibCxxVector.cppformatters::LibcxxVectorBoolSyntheticFrontEnd::GetChildAtIndexaddr_t from m_base_data_address
Language\ObjC\CF.cppformatters::CFBitVectorSummaryProviderValueObject::GetValueAsUnsigned() => Process::ReadPointerFromMemory() => addr_t
Language\ObjC\NSArray.cpp (and similar classes)formatters::XXX::Updateaddr_t from ValueObject::GetValueAsUnsigned()
LanguageRuntime\ObjC\AppleObjCRuntime\AppleObjCClassDescriptorV2.cppClassDescriptorV2::objc_class_t::Read (and similar functions)addr_t passed as argument
LanguageRuntime\ObjC\AppleObjCRuntime\AppleObjCRuntimeV1.cppAppleObjCRuntimeV1::UpdateISAToDescriptorMapIfNeededaddr_t from GetISAHashTablePointer() or from DataExtractor
LanguageRuntime\ObjC\AppleObjCRuntime\AppleObjCRuntimeV2.cppAppleObjCRuntimeV2::UpdateISAToDescriptorMapDynamicaddr_t from Process::AllocateMemory
LanguageRuntime\ObjC\AppleObjCTrampolineHandler\AppleObjCRuntimeV1.cppAppleObjCTrampolineHandler::AppleObjCVTables::VTableRegion::SetUpRegionaddr_t from ValueObject::GetValueAsUnsigned
LanguageRuntime\RenderScript\RenderScriptRuntime\RenderScriptRuntime.cppGetArgsX86 (and similar)addr_t from reg_ctx
ObjectFile\PECOFF\ObjectFilePECOFF.cppObjectFilePECOFF::ReadImageData(offset)addr_t from m_image_base, addr_t calculated from COFF header
Symbol\CompactUnwindInfo.cppCompactUnwindInfo::ScanIndexaddr_t from Section:: GetLoadBaseAddress()
Symbol\ObjectFile.cppObjectFile::ReadMemoryaddr_t passed as argument
Symbol\ObjectFile.cppObjectFile::ReadSectionData((Section, offset_t)addr_t from Section::GetLoadBaseAddress
Symbol\Type.cppType::ReadFromMemory(addr_t)addr_t passed as argument
SystemRuntime\MacOSX\SystemRuntimeMacOSX.cppSystemRuntimeMacOSX::GetQueueNameFromThreadQAddressaddr_t from Process::ReadPointerFromMemory
SystemRuntime\MacOSX\SystemRuntimeMacOSX.cppSystemRuntimeMacOSX::GetExtendedBacktraceThread (and similar fns)addr_t from GetThreadItemInfo
Target\RegisterContext.cppRegisterContext::ReadRegisterValueFromMemory(addr_t)addr_t passed as argument
Target\Target.cppTarget::CreateAddressInModuleBreakpoint(addr_t)addr_t passed as argument
Target\ThreadPlanTracer.cppThreadPlanAssemblyTracer::Logaddr_t from reg_ctx->GetPC()
TypeSystem\Clang\TypeSystemClang.cppTypeSystemClang::DumpSummaryaddr_t from DataExtractor
Utility\RegisterContextMemory.cppRegisterContextMemory::ReadAllRegisterValuesaddr_t from reg_data_addr passed to constructor

A similar analysis could be done for WriteMemory, of course. I might have forgotten something, but the point is that in most places where we call Process::ReadMemory we calculate the address as a uint64_t taken from a register, read from a DataExtractor, or passed by other functions; there is not really a Section to use to construct an Address. Not sure if this is a problem or not.
In other words... I will need some guidance on how best to make this refactoring :-)

.

Note that WasmDWARFEvaluator::Evaluate() not only needs to handle Wasm-specific opcodes like DW_OP_WASM_location, but might also handle in a Wasm-specific way some "standard" opcodes. For example, static variables are generally encoded with DW_OP_addr, but for WebAssembly the corresponding offset is the offset in the module Data section, which is mapped into an area of the Memory section when the module is loaded. So, memory reads related to DW_OP_addr refer to a different space and should be dealt differently by the runtime.
This makes target variable *global_ptr work when we have multiple modules., because we can retrieve the module_id from the Module associated to the DWARFExpression.

I am confused by this line of reasoning. For a global variable, the DW_OP_addr describes the location of the variable itself (the value of &global_ptr). That is what makes it possible to display the value of the pointer (global_ptr). Displaying the value of the pointed-to object (*global_ptr) is a different thing, and AFAIK it completely bypasses any dwarf expressions. Are you saying that the value of the pointer somehow inherits the "address space" of the memory where the pointer itself is located?

I wrote that badly. What I meant was that if I have different Wasm modules, and one of these modules has some static pointer, like:

static uint8_t kBuff[3];

the corresponding DWARF data will be something like:

0x00000026:   DW_TAG_variable
                DW_AT_name	("kBuff")
                DW_AT_type	(0x0000003b "uint8_t[3]")
                ...
                DW_AT_location	(DW_OP_addr 0x0)
                DW_AT_linkage_name	("_ZL5kBuff")

For Wasm, this DW_OP_addr location represents an offset in the Data section, and the Data section is mapped into a region of the Memory for that module. So we can use this DWARF data to display the value of this pointer, but we need to handle this DW_OP_addr opcode in a specific way for Wasm. Then, once we have the memory location, the actual value can be found and displayed in the usual way, by reading the memory.