This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lldb/
-
include/lldb/Utility/
-
lldb/
-
Utility/
1/1
ArchSpec.h
-
source/
-
API/
3/3
SystemInitializerFull.cpp
-
Plugins/ObjectFile/
-
ObjectFile/
-
CMakeLists.txt
-
wasm/
-
CMakeLists.txt
5/5
ObjectFileWasm.h
36/42
ObjectFileWasm.cpp
-
Utility/
4/4
ArchSpec.cpp
-
test/Shell/ObjectFile/wasm/
-
Shell/
-
ObjectFile/
-
wasm/
-
basic.yaml
-
embedded-debug-sections.yaml
-
stripped-debug-sections.yaml
-
tools/lldb-test/
-
lldb-test/
-
SystemInitializerTest.cpp

Differential D71575

[LLDB] Add ObjectFileWasm plugin for WebAssembly debugging
ClosedPublic

Authored by paolosev on Dec 16 2019, 3:16 PM.

Download Raw Diff

Details

Reviewers

jasonmolenda
clayborg
labath

Commits

rG4bafceced6a7: [LLDB] Add ObjectFileWasm plugin for WebAssembly debugging

Summary

This is the first in a series of patches to enable LLDB debugging of WebAssembly targets.

Current versions of Clang emit (partial) DWARF debug information in WebAssembly modules and we can leverage this debug information to give LLDB the ability to do source-level debugging of Wasm code that runs in a WebAssembly engine.

A way to do this could be to use the remote debugging functionalities provided by LLDB via the GDB-remote protocol. Remote debugging can indeed be useful not only to connect a debugger to a process running on a remote machine, but also to connect the debugger to a managed VM or script engine that runs locally, provided that the engine implements a GDB-remote stub that offers the ability to access the engine runtime internal state.

To make this work, the GDB-remote protocol would need to be extended with a few Wasm-specific custom query commands, used to access aspects of the Wasm engine state (like the Wasm memory, Wasm local and global variables, and so on).
Furthermore, the DWARF format would need to be enriched with a few Wasm-specific extensions, here detailed: https://yurydelendik.github.io/webassembly-dwarf.

This CL introduce classes ObjectFileWasm, a file plugin to represent a Wasm module loaded in a debuggee process. It knows how to parse Wasm modules and store the Code section and the DWARF-specific sections.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

paolosev created this revision.Dec 16 2019, 3:16 PM

Herald added subscribers: llvm-commits, lldb-commits, sunfish and 6 others. · View Herald TranscriptDec 16 2019, 3:16 PM

paolosev edited the summary of this revision. (Show Details)Dec 16 2019, 3:19 PM

Sounds exciting. My comments are all about formatting and coding style, if someone has something technical to say, too that would be appreciated.

lldb/source/Plugins/DynamicLoader/WASM-DYLD/DynamicLoaderWasmDYLD.cpp
1 ↗	(On Diff #234168)	This hangs over the line a -- C++ -- is only necessary for .h files where C vs. C++ is ambiguous
77 ↗	(On Diff #234168)	This should be a doxygen comment.
132 ↗	(On Diff #234168)	Can you convert this to early exits? The deep nesting makes the control flow difficult to read.
lldb/source/Plugins/ObjectFile/WASM/ObjectFileWasm.cpp
1 ↗	(On Diff #234168)	ditto
32 ↗	(On Diff #234168)	Is a StringRef comparison easier to read here?
72 ↗	(On Diff #234168)	Do you want any form error logging/handling here?
76 ↗	(On Diff #234168)	Please use full sentences in comments with a trailing `.`.
99 ↗	(On Diff #234168)	early-exitify?
150 ↗	(On Diff #234168)	Again early exits would make this easier to read.
394 ↗	(On Diff #234168)	doxygen comment.
lldb/source/Plugins/ObjectFile/WASM/ObjectFileWasm.h
46 ↗	(On Diff #234168)	FYI you can group doxygen comments like this: /// PluginInterface protocol /// \{ ... /// \}
102 ↗	(On Diff #234168)	`llvm::Optional<uint8_t> GetVaruint7()`
104 ↗	(On Diff #234168)	same here
119 ↗	(On Diff #234168)	doxygen comments on all (most) non-override function would be useful.

I don't know the lldb codebase, but from a webassembly perspective this looks promising.

I suppose we are long way from having a webassebly VM that exports that correct wire protocol to actually test this?

lldb/include/lldb/Utility/ArchSpec.h
191	Add another newline below to the follow the existing grouping pattern?
lldb/source/API/SystemInitializerFull.cpp
72	Can you name this directory "wasm" rather than "WASM" since its not an acronym.
177	Why is the namespace needed here for wasm but not the other three above.. seems inconsistent.
lldb/source/Plugins/DynamicLoader/WASM-DYLD/DynamicLoaderWasmDYLD.h
1 ↗	(On Diff #234168)	Again, avoid WASM in the directory name. I suppose "Wasm" would also be acceptable, but I'm trying to push for "wasm" or "WebAssembly".
lldb/source/Utility/ArchSpec.cpp
108	Is this just clang format being greedy?
lldb/unittests/ObjectFile/WASM/CMakeLists.txt
11 ↗	(On Diff #234168)	trailing newline
lldb/unittests/ObjectFile/WASM/TestObjectFileWasm.cpp
2 ↗	(On Diff #234168)	somethings up with the ling wrapping here.
llvm/include/llvm/BinaryFormat/ELF.h
314 ↗	(On Diff #234168)	This seems like an odd place to add this, given that are not using or relying on ELF anywhere. Does this make sense?

I think this is really nice. I have some minor remarks here and there but otherwise this LGTM.

lldb/source/Plugins/DynamicLoader/WASM-DYLD/DynamicLoaderWasmDYLD.cpp
88 ↗	(On Diff #234168)	what about making this if and the one blow a `if (ModuleSP module_sp = ...) { ...; return module_sp; }`. Then you don't need to do the double-parentheses trick and the end of this function can just be `return ModuleSP();` so it is obvious that the end of this function is the error code path.
171 ↗	(On Diff #234168)	no brackets
lldb/source/Plugins/ObjectFile/WASM/ObjectFileWasm.cpp
32 ↗	(On Diff #234168)	(IMHO it is)
65 ↗	(On Diff #234168)	no brackets
79 ↗	(On Diff #234168)	No brackets
88 ↗	(On Diff #234168)	no brackets
110 ↗	(On Diff #234168)	I don't think we do these `static` comments usually in LLVM?
128 ↗	(On Diff #234168)	Single line if -> no brackets needed.
172 ↗	(On Diff #234168)	Should also be switched to ConstString if you make the member of the section info a ConstString.
191 ↗	(On Diff #234168)	This member is only created here and only used below from what I can see? Also you never compare it against any other strings so it should be a `std::string`.
292 ↗	(On Diff #234168)	range-based loop
299 ↗	(On Diff #234168)	Your `sect_info.name` is already a std::string so comparing here against a ConstString is just a slower and less readable.
447 ↗	(On Diff #234168)	early-exit
449 ↗	(On Diff #234168)	Bonus: You can write this code by directly using `llvm::raw_ostream` by just calling `s->AsRawOstream()` to get the equivalent `raw_ostream`. I migrate all code to LLVM's stream classes so not having more code using lldb::Stream would be nice (but not required to get this patch in). (same for the Stream code below)
lldb/source/Plugins/ObjectFile/WASM/ObjectFileWasm.h
116 ↗	(On Diff #234168)	This should be a `ConstString`. OR you keep this as a `std::string` and then you move all other ConstString variables that compare against your section names to be just plain `std::string`, `llvm::StringRef` and `llvm::StringSwitch`. But mixing `ConstString` and normal strings brings us the disadvantages of both worlds (hard to read and slow to compare) without any benefits.

I think this is pretty good for a first pass, but I would like to see this split up into (at least) three patches (one for each plugin). That way, we can properly focus on each plugin. For instance, I'm pretty sure that the object file and symbol vendor changes are fully testable. The dynamic loader stuff may or may not be, but I don't want to discuss that yet to avoid too many parallel threads going on.

lldb/source/API/SystemInitializerFull.cpp
177	Some of the older code puts "plugins" into the default namespace, but lately we've started to put new plugins into their own namespaces. However, most of the old plugins have not been migrated yet.
lldb/source/Plugins/ObjectFile/WASM/ObjectFileWasm.cpp
299 ↗	(On Diff #234168)	ELF and PECOFF code has been already converted to use StringSwitch for this stuff. I'd do the same here.
llvm/include/llvm/BinaryFormat/ELF.h
314 ↗	(On Diff #234168)	Indeed, that looks very unexpected, and begs an explanation.

Thanks for all the comments! I am updating the code following your suggestions.
Next step will be to split this into three distinct patches, as suggested by Pavel.

lldb/source/Plugins/DynamicLoader/WASM-DYLD/DynamicLoaderWasmDYLD.cpp
1 ↗	(On Diff #234168)	Fixed, but I see the -- C++ -- in all files.
lldb/source/Plugins/ObjectFile/WASM/ObjectFileWasm.cpp
1 ↗	(On Diff #234168)	But all the C++ files seem to have "-- C++ --===//" in the first line.
lldb/source/Utility/ArchSpec.cpp
108	Yes, it was edited by clang format.

paolosev added inline comments.Dec 17 2019, 10:07 PM

llvm/include/llvm/BinaryFormat/ELF.h
314 ↗	(On Diff #234168)	Oops... my mistake. This is a relic from an old version where I expected Wasm stripped debug symbol might also come in an ELF file. Removed.

Addressed first review comments.

teemperor added inline comments.Dec 18 2019, 12:08 AM

lldb/source/Plugins/DynamicLoader/WASM-DYLD/DynamicLoaderWasmDYLD.cpp
1 ↗	(On Diff #234168)	The line just got cargo-culted into many *.cpp files so all occurrences in source files are unintentional.
lldb/source/Utility/ArchSpec.cpp
108	Can you revert that change? clang-format shouldn't touch these unrelated files if you use git-clang-format or something similar.

paolosev updated this revision to Diff 234471.Dec 18 2019, 12:40 AM

paolosev retitled this revision from [LLDB] Add initial support for WebAssembly debugging to [LLDB] Add ObjectFileWasm plugin for WebAssembly debugging.

paolosev edited the summary of this revision. (Show Details)

paolosev marked 2 inline comments as done.Dec 18 2019, 12:48 AM

paolosev updated this revision to Diff 234474.Dec 18 2019, 12:51 AM

paolosev set the repository for this revision to rG LLVM Github Monorepo.

aprantl added inline comments.Dec 19 2019, 11:48 AM

lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp
46	The LLVM coding style requests that doxygen comments should be on the declaration in the header file and not in the implementation.
376	again.. in the header, or inside the function
453	ditto
lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.h
19	This line is redundant and can be removed.
26	this comment is inconsistent with the others
116	Please make sure to use full sentences that end in a `.` in all comments.

paolosev updated this revision to Diff 234920.Dec 20 2019, 10:05 AM

paolosev marked 7 inline comments as done.

paolosev added inline comments.

lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp
46	This function is not declared in the header. Probably it doesn't need a doxygen comment.

aprantl added inline comments.Dec 20 2019, 1:02 PM

lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp
46	If it isn't declared in a header, then it should be either static or in an anonymous namespace. Using a doxygen comment is still preferred, since many IDEs will use that to display online help.

labath added inline comments.Dec 20 2019, 1:24 PM

lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp
41	This looks like it will cause problems on big endian hosts..
253	I take it that wasm files don't have anything like a build id, uuid or something similar?
335–352	It would be nice to merge these two section-creating blocks..
lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.h
117	We don't use this typedef style (except possibly in some old code which we shouldn't emulate).
lldb/unittests/ObjectFile/wasm/TestObjectFileWasm.cpp
1 ↗	(On Diff #234920)	Overall, these tests would be better off as "lit" tests. Something along the lines of: yaml2obj %s >%t lldb-test object-file %t \| FileCheck %t You can look at existing tests in `test/Shell/ObjectFile` for inspiration. Is there anything you test here that "lldb-test object-file" does not print out?

A bunch more comments from me. :)

A higher level question I have is whether there's something suitable within llvm for parsing wasm object files that could be reused. I know this can be tricky with files loaded from memory etc., so it's fine if it isn't possible to do that, but I am wondering if you have considered this option...

lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp
36–39	`if (llvm::identify_magic(toStringRef(data_sp->GetData())) != llvm::file_magic::wasm_object)` maybe ?
41	One way to get around that would be to use something like `llvm::support::read32le`.
47–64	Maybe merge these and make the maximum width a function argument?
193	I'd just use StringRef here -- there's no advantage in ConstStringifying this...
216	This seems odd -- I don't think any of our object file plugins work this way. It's normally the symbol vendor who fiddles with the symbol file spec. This is kind of similar to the gnu_debuglink section, and the way that works in elf is that the object file exposes this via a separate method, which the symbol vendor can then query and do the appropriate thing. Maybe you could just drop this part and we can get back to it with the symbol vendor patch?
218–226	I wouldn't be afraid of presenting `external_debug_info` as an actual section, if that's how it's treated byt the object file format. And it looks like that could simplify this code a bit...
343–344	Are the debug info sections actually loaded into memory for wasm? Should these be zero (that's what they are for elf)?
383	This is strange.. I wouldn't expect that the section decoding logic should depend on the actual address that the object is loaded in memory. Can you explain the reasoning here?
396	Normally I would expect to see `->GetFileAddress()` here, as that's the thing which says how the sections are laid out in memory. The way you create these sections, the two values are the same, but it still seems more correct to call GetFileAddress()

In D71575#1793791, @labath wrote:

A bunch more comments from me. :)

A higher level question I have is whether there's something suitable within llvm for parsing wasm object files that could be reused. I know this can be tricky with files loaded from memory etc., so it's fine if it isn't possible to do that, but I am wondering if you have considered this option.

I have considered this option, there is indeed some code duplication because there is already a Wasm parser in class WasmObjectFile (llvm/include/llvm/Object/Wasm.h).
However class WasmObjectFile is initialized with a memory buffer that contains the whole module, and this is not ideal. LLDB might create an ObjectFileWasm either from a file or by retrieving specific module chunks from a remote, but it doesn't often need to load the whole module in memory.
I think this is the reason why other kind of object files (like ObjectfileELF) are re-implemented in LLDB rather than reusing existing code in LLVM (like ELFFile in llvm/include/llvm/Object/ELF.h).

lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp
41	Good point, modified to use read32le.
253	Not yet. There is a proposal to add a uuid to wasm modules in a custom section, but it's not part of the standard yet.
lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.h
117	Yes, this was copied from old code :). Removed.

Addressed more review comments:

removed code to manage "external_debug_info" sections; logic for this will be implemented in the symbol vendor code.
modified test code, from unittests to be Shell "lit" tests.

lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp
47–64	I'd keep the functions separate, it's better if they return different sized integers.
343–344	Yes, I was thinking that the debug sections should be loaded into memory; not sure how this works for ELF, how does the debugger find the debug info in that case?
383	There is a reason for this: DecodeNextSection() calls: ReadImageData(offset, size) and when the debug info sections are embedded in the wasm module (not loaded from a separated symbol file), ReadImageData() calls Process::ReadMemory() that, for a GDB remote connection, goes to ProcessGDBRemote::DoReadMemory(). Here I pass the offset because it also represents the specific id of the module loaded in the target debuggee process, as explained in the comment above.

I would suggest removing GetVaruint7 and GetVaruint32 and adding "llvm::Optional<uint8_t> DataExtractor::GetULEB128(uint64_t *offset_ptr, uint64_t max_value);" as mentioned in inlined comments.

lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp
50	Is it ok if we consume more than 1 byte here? What is the offset points to a larger ULEB, are we ok with advancing the offset by multiple bytes or should we back it up and return llvm::None? This might be a good candidate to add to DataExtractor directly as: /// Extract a ULEB128 number with a specified max value. If the extracted value exceeds /// "max_value" the offset will be left unchanged and llvm::None will be returned. llvm::Optional<uint8_t> DataExtractor::GetULEB128(uint64_t *offset_ptr, uint64_t max_value); There are many places where we extract a uint64_t, but only need a uint16_t (like in the DWARF parser where all DW_TAG_XXXX, DW_AT_XXX and DW_FORM_XXX values must only be uint16_t values but are encoded as ULEB128 values. So this could be used elsewhere if we do put it into DataExtractor.
60	remove if we add: llvm::Optional<uint8_t> DataExtractor::GetULEB128(uint64_t *offset_ptr, uint64_t max_value);
163–164	Is this a one byte section ID or is it a ULEB? Not sure why it would be encoded as a ULEB if it is always one byte? IF this really is just a one byte value, then replace with: uint8_t section_id = data.GetU8(&offset);

clayborg added inline comments.Jan 6 2020, 1:12 PM

lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp
35	I typically would put "data_sp" into a DataExtractor and extract a uint32_t and then check the decoded value with something like: static bool ValidateModuleHeader(DataExtractor &data, uint64_t *offset_ptr) { auto magic = data.GetU32(offset_ptr); if (magic == WASM_MAGIC) return true; if (magic == WASM_CIGAM) { // Set byte order in DataExtractor data.SetByteOrder(data.GetByteOrder() == eByteOrderBig ? eByteOrderLittle : eByteOrderBig); return true; } return false; } This function expects a DataExtractor to be passed in that has "data_sp" inside of it with the host endian set as the byte order. It will set the byte order correctly. It also expects to have two uint32_t macros defined: WASM_MAGIC and WASM_CIGAM. These contain the non byte swapped and the byte swapped magic values. Easy to replace those with real definitions from else where (llvm::file_magic::wasm_object? Not sure of the type of this though, seemed like a StringRef).
45	use DataExtractor::GetU32()? Or is the byte order always little endian for wasm object files?
188–196	All these lines can use the GetCStr with a length: ConstString sect_name(data.GetCStr(&offset, *name_len)); if (!sect_name) return false;
200	remove ConstString constructor here if we switch code above as suggested.

In D71575#1803321, @paolosev wrote:

In D71575#1793791, @labath wrote:

A bunch more comments from me. :)

A higher level question I have is whether there's something suitable within llvm for parsing wasm object files that could be reused. I know this can be tricky with files loaded from memory etc., so it's fine if it isn't possible to do that, but I am wondering if you have considered this option.

I have considered this option, there is indeed some code duplication because there is already a Wasm parser in class WasmObjectFile (llvm/include/llvm/Object/Wasm.h).
However class WasmObjectFile is initialized with a memory buffer that contains the whole module, and this is not ideal. LLDB might create an ObjectFileWasm either from a file or by retrieving specific module chunks from a remote, but it doesn't often need to load the whole module in memory.
I think this is the reason why other kind of object files (like ObjectfileELF) are re-implemented in LLDB rather than reusing existing code in LLVM (like ELFFile in llvm/include/llvm/Object/ELF.h).

Thanks for sharing your thoughts. I think the history of ObjectFileELF is a bit more complicated, but yes, being able to load from memory is a good reason for not using the llvm reader right now (also, the parsing code seems to be quite simple).

lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp
253	Ok. Thanks for clarifying. I think somethink like that will be useful, as otherwise you can't tell if you're using the correct separate debug info file.
343–344	Are you referring to the load-from-memory, or load-from-file scenario? Normally, the debug info is not loaded into memory, and if lldb is loading the module from a file, then the debug info is loaded from the file (and we use the "file offset" field to locate them). Loading files from memory does not work in general (because we can't find the whole file there -- e.g., section headers are missing). The only case it does work is in case of jitted files. I'm not 100% sure how that works, but given that there is no `if(memory)` block in the section creation code, those sections must too get `vm size = 0`. In practice, I don't think it does not matter that much what you put here -- I expect things will mostly work regardless. I am just trying to make this consistent in some way. If these sections are not found in the memory at the address which you set here (possibly adjusted by the load bias) then I think it makes sense to set `vm_size=vm_addr=0`. If they are indeed present there, then setting it to those values is perfectly fine.
383	What you say about ReadMemory makes sense, but it's not clear to me why you couldn't use the value in `m_memory_addr` for this. For in-memory object file this value should contain the actual address the object file was loaded from (which, I would expect, will include the `module_id` business), and so you wouldn't need the dynamic loader address in order to locate it. I believe this is how the other object file plugins do their in-memory loading..
lldb/source/Utility/ArchSpec.cpp
230	add a trailing comma to avoid subsequent needs to touch this line when adding new entries..
lldb/test/Shell/ObjectFile/wasm/basic.yaml
22–80	Could you remove the sections which are not relevant for this test? This makes it easier to see the correspondence between the yaml and the test expectations..
lldb/test/Shell/ObjectFile/wasm/debug-sections.yaml
42 ↗	(On Diff #236237)	Same here (and you don't even have to put actual valid debug info in the debug info sections -- just a couple of random bytes is sufficient).

paolosev updated this revision to Diff 236868.Jan 8 2020, 11:04 AM

paolosev marked 20 inline comments as done.

paolosev added inline comments.

lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp
35	Not sure about this... WebAssembly is always little-endian and a DataExtractor is not really needed here. I made sure to set the byte order where the DataExtractor is created in `ReadImageData`.
50	Modified in DataExtractor, as suggested. However we need to return a `llvm::Optional<uint64_t>`, which can be not ideal because the result could need to be casted to another integer type. We could also male `DataExtractor::GetULEB128` templatized on the integer type, getting `max_value` as `std::numeric_limits<int>::max()`, but I don't want to overly complicate the code.
60	Actually, section_id is encoded as a byte not as a varuint7, so I modified the code accordingly, with `llvm::Optional<uint8_t> GetByte(DataExtractor &, lldb::offset_t *)`. Even this could be moved to DataExtractor, maybe?
163–164	You are right, it is a one-bye section. I cannot use `data.GetU8(&offset);` though, because it returns 0 on failure.
188–196	I had tried to use `DataExtractor::GetCStr()` but it doesn't work because it wants null-terminated strings, and this is not the case in Wasm, where strings are encoded as len (varuint32) followed by an array of len bytes that represent UTF8 chars (https://webassembly.github.io/spec/core/binary/values.html#names).
343–344	Modified, but I am not sure I completely understood the difference of file addresses and vm addresses. In the case of Wasm, we can have two cases: a) Wasm module contains executable code and also DWARF data embedded in DWARF sections b) Wasm module contains code and has a custom section that points to a separated Wasm module that only contains DWARF sections The file of Wasm modules that contains code should never be loaded directly by LLDB; LLDB should use GDB-remote to connect to the Wasm engine that loaded the module and retrieve the module content from the engine. But when the DWARF data has been stripped into a separate Wasm file, that file should be directly loaded by LLDB in order to load the debug sections. So, if I am not wrong, only in the first case we should assume that the debug info is loaded into memory, and have vm_size != 0?
383	Good point! Changed.
lldb/test/Shell/ObjectFile/wasm/basic.yaml
22–80	Ok. What can be confusing is that Wasm modules (almost) always have at least a few standard sections (MEMORY, GLOBAL, CODE, ...) but these sections can be ignored by LLDB, so also by these tests.

labath added inline comments.Jan 9 2020, 3:55 AM

lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp
343–344	You're right, this is getting pretty confusing, as a lot of the concepts we're talking about here are overloaded (and not even consistently used within lldb). Let me try to elaborate here. I'll start by describing various Section fields (I'll use ELF as a reference): file offset (`m_file_offset`) - where the section is physically located in the file. It does not matter if the file is loaded from inferior memory or not (in the former case, this is an offset from location of the first byte of the file). In elf this corresponds to the `sh_offset` field. file size (`m_file_size`) - size of the section in the file. Some sections don't take up any space in the file (`.bss`), so they have this zero. This is roughly the `sh_size` field in elf (but it is adjusted for SHT_NOBITS sections like .bss) file address (`m_file_addr`) - The address where the object file says this section should be loaded to. Note that this may not be the final address due to ASLR or such. It is the job of the SetLoadAddress function to compute the actual final address (which is then called the "load address"). This corresponds to the `sh_addr` field in elf, but it is also sometimes called the "vm address", because of the `p_vaddr` program header field vm size (`m_byte_size) size of the section when it gets loaded into memory. Sections which don't get loaded into memory have this as 0. This is also "rougly"` sh_size`, but only for SHF_ALLOC sections. All of these fields are really geared for the case where lldb opens a file from disk, and then uses that to understand the contents of process memory. They become pretty redundant if you have loaded the file from memory, but it should still be possible to assign meaningful values to each of them. The file offset+file size combo should reflect the locations of the sections within the file (regardless of whether it's a "real" file or just a chunk of memory). And the file address+vm size combo should reflect the locations of the sections in memory (modulo ASLR, and again independently of where lldb happened to read the file from). Looking at the patch, I think you've got most of these right. The only question is, what is the actual memory layout of a wasm debug info in the inferior process. I originally highlighted this because I was assuming the typical elf model where the debug info is not loaded into memory and the debugger loads it from a file. (jitted files are an afterthought in elf, and not that well supported/tested). However, now I get the impression that the wasm debug info is actually always loaded into memory (assuming it is present, that is), and so setting the vm size to non zero might actually be correct here. You'll need to make the call there, as I don't know how wasm actually works. (Note that you needn't concern yourself much with separate debug files here -- the vm size only really matters once you call SetLoadAddress to actually mark the object file as loaded into the process, and we never call SetLoadAddress on a separate debug file).

paolosev marked 2 inline comments as done.Jan 10 2020, 9:56 AM

paolosev added inline comments.

lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp
343–344	Thanks for the clarification! WebAssembly modules are not really loaded into the memory of the inferior process like native executables are. The Wasm/JavaScript engine loads the module and executes it, either interpreting the Wasm bytecode or jit-compiling it into native code. Furthermore, in WebAssembly the code is distinct from the addressable memory. All this poses some interesting problems to enable debugging with LLDB. My idea is that a Wasm engine that wants to support LLDB debugging should implement a GDB remote stub and LLDB would not directly access the inferior memory but only access it by talking with the engine through the remote protocol. From what concerns debug info, by default Clang embeds it in a few custom sections in the Wasm module itself. But normally these debug sections should be stripped into a separate Wasm file (we should modify llvm-objcopy for this) because they are not useful in the engine and take a lot of space. Usually an engine loads and keeps the whole Wasm module into memory, so when the debug info is embedded in the module, we can say that the debug info is loaded into the inferior process but it’s only accessible through the engine. Here SetLoadAddress will be used to specify at which ‘virtual address’ the engine loaded the module; for example a module with id==4 will be loaded at address 0x00000004`00000000 (logic for this will be implemented in a new class DynamicLoaderWasm). LLDB will use gdb-remote commands like ‘m’ to read chunks of the debug sections from a loaded Wasm modules, and the remote stub identifies the module from the id encoded in the address. When the debug info is in a separate file, as you say we don’t call SetLoadAddress on that file, and instead LLDB will use m_file_offset, m_file_size to read the debug directly from the file (m_file_offset, m_file_size can be 0). The code above should be consistent with this logic.

I apologize for the noob question, but how do I schedule a build for this diff with Harbormaster?

paolosev added a child revision: D72650: [LLDB] Add SymbolVendorWasm plugin for WebAssembly debugging.Jan 13 2020, 2:18 PM

Sorry for the delay. I was trying to figure out whether I want to get into the whole DataExtractor discussion or not -- I eventually did... :/

Besides that bit, I think this is looking good..

lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp
343–344	Cool, thanks for expaining. I think we've both learned a lot here.
lldb/source/Utility/DataExtractor.cpp
914 ↗	(On Diff #236868)	It doesn't look like this actually happens, does it? (If max_value is exceeded, the offset will still be updated, right?). And overall, I am not very happy with backdooring an api inconsistent with the rest of the DataExtractor (I am aware it was clayborg's idea). Overall, it would probably be better to use the llvm DataExtractor class, which has the Cursor interface designed to solve some of the problems you have here (it can handle EOF, it cannot check the uleb magnitude). And it tries to minimize the number of times you need to error check everything. The usage of it could be something like: llvm::DataExtractor llvm_data = lldb_data.GetAsLLVM(); llvm::DataExtractor::Cursor c(0); unsigned id = llvm_data.GetU8(c); unsigned payload_len = llvm_data.GetULEB128(c); if (!c) return c.takeError(); // id and payload_len are valid here if (id == 0) { unsigned name_len = llvm_data.GetULEB128(c); SmallVector<uint8_t, 32> name_storage; llvm_data.GetU8(c, name_storage, name_len); if (!c) return c.takeError(); // name_len and name valid here StringRef name = toStringRef(makeArrayRef(name_storage)); unsigned section_length = ...; m_sect_infos.push_back(...) } This won't handle the uleb magnitude check, but these checks seem irrelevant and/or subsumable by other, more useful checks: a) Checking the name length is not necessary, as the code will fail for any names longer 1024 anyway (as that's the amount of data you read); b) instead of `section_len < 2^32` it seems more useful to check that `*offset_ptr + section_len` is less than `2^32`, to make sure we don't wrap the `module_id` part of the "address".

In D71575#1814658, @paolosev wrote:

I apologize for the noob question, but how do I schedule a build for this diff with Harbormaster?

Harbormaster is a red herring. There's no automated pre-commit testing in llvm (TBE, there's an experimental @merge_guards_bot, which you can opt into, but it doesn't test or build lldb yet, so it's not very useful for you now...).

paolosev updated this revision to Diff 238168.Jan 14 2020, 10:34 PM

paolosev marked 2 inline comments as done.

paolosev added inline comments.

lldb/source/Utility/DataExtractor.cpp
914 ↗	(On Diff #236868)	Good points! Changed.

Thanks. I think this is looking very good now. Excited to have this ready.

Do you have commit access?

This revision is now accepted and ready to land.Jan 15 2020, 12:56 AM

In D71575#1821312, @labath wrote:

Thanks. I think this is looking very good now. Excited to have this ready.

Do you have commit access?

No, I certainly don't have commit access, this would be my first accepted patch. :)

Closed by commit rG4bafceced6a7: [LLDB] Add ObjectFileWasm plugin for WebAssembly debugging (authored by Paolo Severini <paolosev@microsoft.com>, committed by dschuff). · Explain WhyJan 15 2020, 4:32 PM

This revision was automatically updated to reflect the committed changes.

jingham mentioned this in rGcd9e5c32302c: Fix the macos build after D71575..Jan 15 2020, 6:15 PM

BTW, I had to fix this patch (cd9e5c32302cd3b34b796683eedb072c6a1cfdc1) to build on macOS. uint64_t and size_t are differently spelled (though I think otherwise equivalent.) One is "long long unsigned int", the other "long unsigned int". I have no idea why that's true, but std::min refuses to compare a size_t and a unit64_t. Anyway, I fixed this by casting one of the two sides of the comparison. But this was causing problems because we have an api (ReadImageData) that takes a uint64_t for the offset and a size_t for the size. That seems a little weird to me, why are these different types?

In D71575#1823252, @jingham wrote:

BTW, I had to fix this patch (cd9e5c32302cd3b34b796683eedb072c6a1cfdc1) to build on macOS. uint64_t and size_t are differently spelled (though I think otherwise equivalent.) One is "long long unsigned int", the other "long unsigned int". I have no idea why that's true, but std::min refuses to compare a size_t and a unit64_t. Anyway, I fixed this by casting one of the two sides of the comparison. But this was causing problems because we have an api (ReadImageData) that takes a uint64_t for the offset and a size_t for the size. That seems a little weird to me, why are these different types?

I am sorry for this problem, thank you for the fix! Evidently size_t can also be 32 bit, like in this case
To be more precise ReadImageData should take an lldb::offset_t as first argument (which is indeed an uint64_t) and it should use the same type for the size; I'll clean up this in a separate patch.

In D71575#1822343, @paolosev wrote:

In D71575#1821312, @labath wrote:

Do you have commit access?

No, I certainly don't have commit access, this would be my first accepted patch. :)

Well.. congratulations. :) I was making sure you are able to commit this, but it looks like you already have that covered.

Revision Contents

Path

Size

lldb/

include/

lldb/

Utility/

ArchSpec.h

2 lines

source/

API/

SystemInitializerFull.cpp

3 lines

Plugins/

ObjectFile/

CMakeLists.txt

1 line

wasm/

CMakeLists.txt

11 lines

ObjectFileWasm.h

138 lines

ObjectFileWasm.cpp

435 lines

Utility/

ArchSpec.cpp

9 lines

test/

Shell/

ObjectFile/

wasm/

basic.yaml

30 lines

embedded-debug-sections.yaml

67 lines

stripped-debug-sections.yaml

54 lines

tools/

lldb-test/

SystemInitializerTest.cpp

3 lines

Diff 238168

lldb/include/lldb/Utility/ArchSpec.h

Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	enum Core {
eCore_hexagon_hexagonv4,		eCore_hexagon_hexagonv4,
eCore_hexagon_hexagonv5,		eCore_hexagon_hexagonv5,

eCore_uknownMach32,		eCore_uknownMach32,
eCore_uknownMach64,		eCore_uknownMach64,

eCore_arc, // little endian ARC		eCore_arc, // little endian ARC

		eCore_wasm32,
		sbc100Unsubmitted Done Reply Inline Actions Add another newline below to the follow the existing grouping pattern? sbc100: Add another newline below to the follow the existing grouping pattern?

kNumCores,		kNumCores,

kCore_invalid,		kCore_invalid,
// The following constants are used for wildcard matching only		// The following constants are used for wildcard matching only
kCore_any,		kCore_any,
kCore_arm_any,		kCore_arm_any,
kCore_ppc_any,		kCore_ppc_any,
kCore_ppc64_any,		kCore_ppc64_any,
▲ Show 20 Lines • Show All 342 Lines • Show Last 20 Lines

lldb/source/API/SystemInitializerFull.cpp

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
#include "Plugins/LanguageRuntime/RenderScript/RenderScriptRuntime/RenderScriptRuntime.h"		#include "Plugins/LanguageRuntime/RenderScript/RenderScriptRuntime/RenderScriptRuntime.h"
#include "Plugins/MemoryHistory/asan/MemoryHistoryASan.h"		#include "Plugins/MemoryHistory/asan/MemoryHistoryASan.h"
#include "Plugins/ObjectContainer/BSD-Archive/ObjectContainerBSDArchive.h"		#include "Plugins/ObjectContainer/BSD-Archive/ObjectContainerBSDArchive.h"
#include "Plugins/ObjectContainer/Universal-Mach-O/ObjectContainerUniversalMachO.h"		#include "Plugins/ObjectContainer/Universal-Mach-O/ObjectContainerUniversalMachO.h"
#include "Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.h"		#include "Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.h"
#include "Plugins/ObjectFile/ELF/ObjectFileELF.h"		#include "Plugins/ObjectFile/ELF/ObjectFileELF.h"
#include "Plugins/ObjectFile/Mach-O/ObjectFileMachO.h"		#include "Plugins/ObjectFile/Mach-O/ObjectFileMachO.h"
#include "Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.h"		#include "Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.h"
		#include "Plugins/ObjectFile/wasm/ObjectFileWasm.h"
		sbc100Unsubmitted Done Reply Inline Actions Can you name this directory "wasm" rather than "WASM" since its not an acronym. sbc100: Can you name this directory "wasm" rather than "WASM" since its not an acronym.
#include "Plugins/OperatingSystem/Python/OperatingSystemPython.h"		#include "Plugins/OperatingSystem/Python/OperatingSystemPython.h"
#include "Plugins/Platform/Android/PlatformAndroid.h"		#include "Plugins/Platform/Android/PlatformAndroid.h"
#include "Plugins/Platform/FreeBSD/PlatformFreeBSD.h"		#include "Plugins/Platform/FreeBSD/PlatformFreeBSD.h"
#include "Plugins/Platform/Linux/PlatformLinux.h"		#include "Plugins/Platform/Linux/PlatformLinux.h"
#include "Plugins/Platform/MacOSX/PlatformMacOSX.h"		#include "Plugins/Platform/MacOSX/PlatformMacOSX.h"
#include "Plugins/Platform/MacOSX/PlatformRemoteiOS.h"		#include "Plugins/Platform/MacOSX/PlatformRemoteiOS.h"
#include "Plugins/Platform/NetBSD/PlatformNetBSD.h"		#include "Plugins/Platform/NetBSD/PlatformNetBSD.h"
#include "Plugins/Platform/OpenBSD/PlatformOpenBSD.h"		#include "Plugins/Platform/OpenBSD/PlatformOpenBSD.h"
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
llvm::Error SystemInitializerFull::Initialize() {		llvm::Error SystemInitializerFull::Initialize() {
if (auto e = SystemInitializerCommon::Initialize())		if (auto e = SystemInitializerCommon::Initialize())
return e;		return e;

breakpad::ObjectFileBreakpad::Initialize();		breakpad::ObjectFileBreakpad::Initialize();
ObjectFileELF::Initialize();		ObjectFileELF::Initialize();
ObjectFileMachO::Initialize();		ObjectFileMachO::Initialize();
ObjectFilePECOFF::Initialize();		ObjectFilePECOFF::Initialize();
		wasm::ObjectFileWasm::Initialize();
		sbc100Unsubmitted Done Reply Inline Actions Why is the namespace needed here for wasm but not the other three above.. seems inconsistent. sbc100: Why is the namespace needed here for wasm but not the other three above.. seems inconsistent.
		labathUnsubmitted Done Reply Inline Actions Some of the older code puts "plugins" into the default namespace, but lately we've started to put new plugins into their own namespaces. However, most of the old plugins have not been migrated yet. labath: Some of the older code puts "plugins" into the default namespace, but lately we've started to…

ObjectContainerBSDArchive::Initialize();		ObjectContainerBSDArchive::Initialize();
ObjectContainerUniversalMachO::Initialize();		ObjectContainerUniversalMachO::Initialize();

ScriptInterpreterNone::Initialize();		ScriptInterpreterNone::Initialize();

#ifndef LLDB_DISABLE_PYTHON		#ifndef LLDB_DISABLE_PYTHON
OperatingSystemPython::Initialize();		OperatingSystemPython::Initialize();
▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	#if defined(__APPLE__)
PlatformiOSSimulator::Terminate();		PlatformiOSSimulator::Terminate();
PlatformDarwinKernel::Terminate();		PlatformDarwinKernel::Terminate();
#endif		#endif

breakpad::ObjectFileBreakpad::Terminate();		breakpad::ObjectFileBreakpad::Terminate();
ObjectFileELF::Terminate();		ObjectFileELF::Terminate();
ObjectFileMachO::Terminate();		ObjectFileMachO::Terminate();
ObjectFilePECOFF::Terminate();		ObjectFilePECOFF::Terminate();
		wasm::ObjectFileWasm::Terminate();

ObjectContainerBSDArchive::Terminate();		ObjectContainerBSDArchive::Terminate();
ObjectContainerUniversalMachO::Terminate();		ObjectContainerUniversalMachO::Terminate();

// Now shutdown the common parts, in reverse order.		// Now shutdown the common parts, in reverse order.
SystemInitializerCommon::Terminate();		SystemInitializerCommon::Terminate();
}		}

lldb/source/Plugins/ObjectFile/CMakeLists.txt

	add_subdirectory(Breakpad)			add_subdirectory(Breakpad)
	add_subdirectory(ELF)			add_subdirectory(ELF)
	add_subdirectory(Mach-O)			add_subdirectory(Mach-O)
	add_subdirectory(PECOFF)			add_subdirectory(PECOFF)
	add_subdirectory(JIT)			add_subdirectory(JIT)
				add_subdirectory(wasm)
				No newline at end of file

lldb/source/Plugins/ObjectFile/wasm/CMakeLists.txt

This file was added.

				add_lldb_library(lldbPluginObjectFileWasm PLUGIN
				ObjectFileWasm.cpp

				LINK_LIBS
				lldbCore
				lldbHost
				lldbSymbol
				lldbUtility
				LINK_COMPONENTS
				Support
				)

lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.h

This file was added.

				//===-- ObjectFileWasm.h ----------------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLDB_PLUGINS_OBJECTFILE_WASM_OBJECTFILEWASM_H
				#define LLDB_PLUGINS_OBJECTFILE_WASM_OBJECTFILEWASM_H

				#include "lldb/Symbol/ObjectFile.h"
				#include "lldb/Utility/ArchSpec.h"

				namespace lldb_private {
				namespace wasm {

				/// Generic Wasm object file reader.
				///
				aprantlUnsubmitted Done Reply Inline Actions This line is redundant and can be removed. aprantl: This line is redundant and can be removed.
				/// This class provides a generic wasm32 reader plugin implementing the
				/// ObjectFile protocol.
				class ObjectFileWasm : public ObjectFile {
				public:
				static void Initialize();
				static void Terminate();

				aprantlUnsubmitted Done Reply Inline Actions this comment is inconsistent with the others aprantl: this comment is inconsistent with the others
				static ConstString GetPluginNameStatic();
				static const char *GetPluginDescriptionStatic() {
				return "WebAssembly object file reader.";
				}

				static ObjectFile *
				CreateInstance(const lldb::ModuleSP &module_sp, lldb::DataBufferSP &data_sp,
				lldb::offset_t data_offset, const FileSpec *file,
				lldb::offset_t file_offset, lldb::offset_t length);

				static ObjectFile *CreateMemoryInstance(const lldb::ModuleSP &module_sp,
				lldb::DataBufferSP &data_sp,
				const lldb::ProcessSP &process_sp,
				lldb::addr_t header_addr);

				static size_t GetModuleSpecifications(const FileSpec &file,
				lldb::DataBufferSP &data_sp,
				lldb::offset_t data_offset,
				lldb::offset_t file_offset,
				lldb::offset_t length,
				ModuleSpecList &specs);

				/// PluginInterface protocol.
				/// \{
				ConstString GetPluginName() override { return GetPluginNameStatic(); }
				uint32_t GetPluginVersion() override { return 1; }
				/// \}

				/// ObjectFile Protocol.
				/// \{
				bool ParseHeader() override;

				lldb::ByteOrder GetByteOrder() const override {
				return m_arch.GetByteOrder();
				}

				bool IsExecutable() const override { return true; }

				uint32_t GetAddressByteSize() const override {
				return m_arch.GetAddressByteSize();
				}

				AddressClass GetAddressClass(lldb::addr_t file_addr) override {
				return AddressClass::eInvalid;
				}

				Symtab *GetSymtab() override;

				bool IsStripped() override { return true; }

				void CreateSections(SectionList &unified_section_list) override;

				void Dump(Stream *s) override;

				ArchSpec GetArchitecture() override { return m_arch; }

				UUID GetUUID() override { return m_uuid; }

				uint32_t GetDependentModules(FileSpecList &files) override { return 0; }

				Type CalculateType() override { return eTypeExecutable; }

				Strata CalculateStrata() override { return eStrataUser; }

				bool SetLoadAddress(lldb_private::Target &target, lldb::addr_t value,
				bool value_is_offset) override;

				lldb_private::Address GetBaseAddress() override {
				return IsInMemory() ? Address(m_memory_addr + m_code_section_offset)
				: Address(m_code_section_offset);
				}
				/// \}

				private:
				ObjectFileWasm(const lldb::ModuleSP &module_sp, lldb::DataBufferSP &data_sp,
				lldb::offset_t data_offset, const FileSpec *file,
				lldb::offset_t offset, lldb::offset_t length);
				ObjectFileWasm(const lldb::ModuleSP &module_sp,
				lldb::DataBufferSP &header_data_sp,
				const lldb::ProcessSP &process_sp, lldb::addr_t header_addr);

				/// Wasm section decoding routines.
				/// \{
				bool DecodeNextSection(lldb::offset_t *offset_ptr);
				bool DecodeSections();
				/// \}

				/// Read a range of bytes from the Wasm module.
				DataExtractor ReadImageData(uint64_t offset, size_t size);

				aprantlUnsubmitted Done Reply Inline Actions Please make sure to use full sentences that end in a `.` in all comments. aprantl: Please make sure to use full sentences that end in a `.` in all comments.
				typedef struct section_info {
				labathUnsubmitted Done Reply Inline Actions We don't use this typedef style (except possibly in some old code which we shouldn't emulate). labath: We don't use this typedef style (except possibly in some old code which we shouldn't emulate).
				paolosevAuthorUnsubmitted Done Reply Inline Actions Yes, this was copied from old code :). Removed. paolosev: Yes, this was copied from old code :). Removed.
				lldb::offset_t offset;
				uint32_t size;
				uint32_t id;
				ConstString name;
				} section_info_t;

				/// Wasm section header dump routines.
				/// \{
				void DumpSectionHeader(llvm::raw_ostream &ostream, const section_info_t &sh);
				void DumpSectionHeaders(llvm::raw_ostream &ostream);
				/// \}

				std::vector<section_info_t> m_sect_infos;
				ArchSpec m_arch;
				UUID m_uuid;
				uint32_t m_code_section_offset;
				};

				} // namespace wasm
				} // namespace lldb_private
				#endif // LLDB_PLUGINS_OBJECTFILE_WASM_OBJECTFILEWASM_H

lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp

This file was added.

				//===-- ObjectFileWasm.cpp ------------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "ObjectFileWasm.h"
				#include "lldb/Core/Module.h"
				#include "lldb/Core/ModuleSpec.h"
				#include "lldb/Core/PluginManager.h"
				#include "lldb/Core/Section.h"
				#include "lldb/Target/Process.h"
				#include "lldb/Target/SectionLoadList.h"
				#include "lldb/Target/Target.h"
				#include "lldb/Utility/DataBufferHeap.h"
				#include "lldb/Utility/Log.h"
				#include "llvm/ADT/ArrayRef.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/BinaryFormat/Magic.h"
				#include "llvm/BinaryFormat/Wasm.h"
				#include "llvm/Support/Endian.h"
				#include "llvm/Support/Format.h"

				using namespace lldb;
				using namespace lldb_private;
				using namespace lldb_private::wasm;

				static const uint32_t kWasmHeaderSize =
				sizeof(llvm::wasm::WasmMagic) + sizeof(llvm::wasm::WasmVersion);

				/// Checks whether the data buffer starts with a valid Wasm module header.
				static bool ValidateModuleHeader(const DataBufferSP &data_sp) {
				clayborgUnsubmitted Not Done Reply Inline Actions I typically would put "data_sp" into a DataExtractor and extract a uint32_t and then check the decoded value with something like: static bool ValidateModuleHeader(DataExtractor &data, uint64_t offset_ptr) { auto magic = data.GetU32(offset_ptr); if (magic == WASM_MAGIC) return true; if (magic == WASM_CIGAM) { // Set byte order in DataExtractor data.SetByteOrder(data.GetByteOrder() == eByteOrderBig ? eByteOrderLittle : eByteOrderBig); return true; } return false; } This function expects a DataExtractor to be passed in that has "data_sp" inside of it with the host endian set as the byte order. It will set the byte order correctly. It also expects to have two uint32_t macros defined: WASM_MAGIC and WASM_CIGAM. These contain the non byte swapped and the byte swapped magic values. Easy to replace those with real definitions from else where (llvm::file_magic::wasm_object? Not sure of the type of this though, seemed like a StringRef). clayborg:* I typically would put "data_sp" into a DataExtractor and extract a uint32_t and then check the…
				paolosevAuthorUnsubmitted Done Reply Inline Actions Not sure about this... WebAssembly is always little-endian and a DataExtractor is not really needed here. I made sure to set the byte order where the DataExtractor is created in `ReadImageData`. paolosev: Not sure about this... WebAssembly is always little-endian and a DataExtractor is not really…
				if (!data_sp \|\| data_sp->GetByteSize() < kWasmHeaderSize)
				return false;

				if (llvm::identify_magic(toStringRef(data_sp->GetData())) !=
				labathUnsubmitted Done Reply Inline Actions `if (llvm::identify_magic(toStringRef(data_sp->GetData())) != llvm::file_magic::wasm_object)` maybe ? labath: `if (llvm::identify_magic(toStringRef(data_sp->GetData())) != llvm::file_magic::wasm_object)`…
				llvm::file_magic::wasm_object)
				return false;
				labathUnsubmitted Done Reply Inline Actions This looks like it will cause problems on big endian hosts.. labath: This looks like it will cause problems on big endian hosts..
				labathUnsubmitted Done Reply Inline Actions One way to get around that would be to use something like `llvm::support::read32le`. labath: One way to get around that would be to use something like `llvm::support::read32le`.
				paolosevAuthorUnsubmitted Done Reply Inline Actions Good point, modified to use read32le. paolosev: Good point, modified to use read32le.

				uint8_t *Ptr = data_sp->GetBytes() + sizeof(llvm::wasm::WasmMagic);

				uint32_t version = llvm::support::endian::read32le(Ptr);
				clayborgUnsubmitted Done Reply Inline Actions use DataExtractor::GetU32()? Or is the byte order always little endian for wasm object files? clayborg: use DataExtractor::GetU32()? Or is the byte order always little endian for wasm object files?
				return version == llvm::wasm::WasmVersion;
				aprantlUnsubmitted Done Reply Inline Actions The LLVM coding style requests that doxygen comments should be on the declaration in the header file and not in the implementation. aprantl: The LLVM coding style requests that doxygen comments should be on the declaration in the header…
				paolosevAuthorUnsubmitted Done Reply Inline Actions This function is not declared in the header. Probably it doesn't need a doxygen comment. paolosev: This function is not declared in the header. Probably it doesn't need a doxygen comment.
				aprantlUnsubmitted Done Reply Inline Actions If it isn't declared in a header, then it should be either static or in an anonymous namespace. Using a doxygen comment is still preferred, since many IDEs will use that to display online help. aprantl: If it isn't declared in a header, then it should be either static or in an anonymous namespace.
				}

				static llvm::Optional<ConstString>
				GetWasmString(llvm::DataExtractor &data, llvm::DataExtractor::Cursor &c) {
				clayborgUnsubmitted Done Reply Inline Actions Is it ok if we consume more than 1 byte here? What is the offset points to a larger ULEB, are we ok with advancing the offset by multiple bytes or should we back it up and return llvm::None? This might be a good candidate to add to DataExtractor directly as: /// Extract a ULEB128 number with a specified max value. If the extracted value exceeds /// "max_value" the offset will be left unchanged and llvm::None will be returned. llvm::Optional<uint8_t> DataExtractor::GetULEB128(uint64_t offset_ptr, uint64_t max_value); There are many places where we extract a uint64_t, but only need a uint16_t (like in the DWARF parser where all DW_TAG_XXXX, DW_AT_XXX and DW_FORM_XXX values must only be uint16_t values but are encoded as ULEB128 values. So this could be used elsewhere if we do put it into DataExtractor. clayborg:* Is it ok if we consume more than 1 byte here? What is the offset points to a larger ULEB, are…
				paolosevAuthorUnsubmitted Done Reply Inline Actions Modified in DataExtractor, as suggested. However we need to return a `llvm::Optional<uint64_t>`, which can be not ideal because the result could need to be casted to another integer type. We could also male `DataExtractor::GetULEB128` templatized on the integer type, getting `max_value` as `std::numeric_limits<int>::max()`, but I don't want to overly complicate the code. paolosev: Modified in DataExtractor, as suggested. However we need to return a `llvm::Optional<uint64_t>`…
				// A Wasm string is encoded as a vector of UTF-8 codes.
				// Vectors are encoded with their u32 length followed by the element
				// sequence.
				uint64_t len = data.getULEB128(c);
				if (!c) {
				consumeError(c.takeError());
				return llvm::None;
				}

				if (len >= (uint64_t(1) << 32)) {
				clayborgUnsubmitted Done Reply Inline Actions remove if we add: llvm::Optional<uint8_t> DataExtractor::GetULEB128(uint64_t offset_ptr, uint64_t max_value); clayborg:* remove if we add: ``` llvm::Optional<uint8_t> DataExtractor::GetULEB128(uint64_t *offset_ptr…
				paolosevAuthorUnsubmitted Done Reply Inline Actions Actually, section_id is encoded as a byte not as a varuint7, so I modified the code accordingly, with `llvm::Optional<uint8_t> GetByte(DataExtractor &, lldb::offset_t )`. Even this could be moved to DataExtractor, maybe? paolosev:* Actually, section_id is encoded as a byte not as a varuint7, so I modified the code accordingly…
				return llvm::None;
				}

				llvm::SmallVector<uint8_t, 32> str_storage;
				labathUnsubmitted Not Done Reply Inline Actions Maybe merge these and make the maximum width a function argument? labath: Maybe merge these and make the maximum width a function argument?
				paolosevAuthorUnsubmitted Done Reply Inline Actions I'd keep the functions separate, it's better if they return different sized integers. paolosev: I'd keep the functions separate, it's better if they return different sized integers.
				data.getU8(c, str_storage, len);
				if (!c) {
				consumeError(c.takeError());
				return llvm::None;
				}

				llvm::StringRef str = toStringRef(makeArrayRef(str_storage));
				return ConstString(str);
				}

				void ObjectFileWasm::Initialize() {
				PluginManager::RegisterPlugin(GetPluginNameStatic(),
				GetPluginDescriptionStatic(), CreateInstance,
				CreateMemoryInstance, GetModuleSpecifications);
				}

				void ObjectFileWasm::Terminate() {
				PluginManager::UnregisterPlugin(CreateInstance);
				}

				ConstString ObjectFileWasm::GetPluginNameStatic() {
				static ConstString g_name("wasm");
				return g_name;
				}

				ObjectFile *
				ObjectFileWasm::CreateInstance(const ModuleSP &module_sp, DataBufferSP &data_sp,
				offset_t data_offset, const FileSpec *file,
				offset_t file_offset, offset_t length) {
				Log *log(GetLogIfAllCategoriesSet(LIBLLDB_LOG_OBJECT));

				if (!data_sp) {
				data_sp = MapFileData(*file, length, file_offset);
				if (!data_sp) {
				LLDB_LOGF(log, "Failed to create ObjectFileWasm instance for file %s",
				file->GetPath().c_str());
				return nullptr;
				}
				data_offset = 0;
				}

				assert(data_sp);
				if (!ValidateModuleHeader(data_sp)) {
				LLDB_LOGF(log,
				"Failed to create ObjectFileWasm instance: invalid Wasm header");
				return nullptr;
				}

				// Update the data to contain the entire file if it doesn't contain it
				// already.
				if (data_sp->GetByteSize() < length) {
				data_sp = MapFileData(*file, length, file_offset);
				if (!data_sp) {
				LLDB_LOGF(log,
				"Failed to create ObjectFileWasm instance: cannot read file %",
				file->GetPath().c_str());
				return nullptr;
				}
				data_offset = 0;
				}

				std::unique_ptr<ObjectFileWasm> objfile_up(new ObjectFileWasm(
				module_sp, data_sp, data_offset, file, file_offset, length));
				ArchSpec spec = objfile_up->GetArchitecture();
				if (spec && objfile_up->SetModulesArchitecture(spec)) {
				LLDB_LOGF(log,
				"%p ObjectFileWasm::CreateInstance() module = %p (%s), file = %s",
				static_cast<void *>(objfile_up.get()),
				static_cast<void *>(objfile_up->GetModule().get()),
				objfile_up->GetModule()->GetSpecificationDescription().c_str(),
				file ? file->GetPath().c_str() : "<NULL>");
				return objfile_up.release();
				}

				LLDB_LOGF(log, "Failed to create ObjectFileWasm instance");
				return nullptr;
				}

				ObjectFile *ObjectFileWasm::CreateMemoryInstance(const ModuleSP &module_sp,
				DataBufferSP &data_sp,
				const ProcessSP &process_sp,
				addr_t header_addr) {
				if (!ValidateModuleHeader(data_sp))
				return nullptr;

				std::unique_ptr<ObjectFileWasm> objfile_up(
				new ObjectFileWasm(module_sp, data_sp, process_sp, header_addr));
				ArchSpec spec = objfile_up->GetArchitecture();
				if (spec && objfile_up->SetModulesArchitecture(spec))
				return objfile_up.release();
				return nullptr;
				}

				bool ObjectFileWasm::DecodeNextSection(lldb::offset_t *offset_ptr) {
				// Buffer sufficient to read a section header and find the pointer to the next
				// section.
				const uint32_t kBufferSize = 1024;
				DataExtractor section_header_data = ReadImageData(*offset_ptr, kBufferSize);

				llvm::DataExtractor data = section_header_data.GetAsLLVM();
				clayborgUnsubmitted Done Reply Inline Actions Is this a one byte section ID or is it a ULEB? Not sure why it would be encoded as a ULEB if it is always one byte? IF this really is just a one byte value, then replace with: uint8_t section_id = data.GetU8(&offset); clayborg: Is this a one byte section ID or is it a ULEB? Not sure why it would be encoded as a ULEB if it…
				paolosevAuthorUnsubmitted Done Reply Inline Actions You are right, it is a one-bye section. I cannot use `data.GetU8(&offset);` though, because it returns 0 on failure. paolosev: You are right, it is a one-bye section. I cannot use `data.GetU8(&offset);` though, because it…
				llvm::DataExtractor::Cursor c(0);

				// Each section consists of:
				// - a one-byte section id,
				// - the u32 size of the contents, in bytes,
				// - the actual contents.
				uint8_t section_id = data.getU8(c);
				uint64_t payload_len = data.getULEB128(c);
				if (!c)
				return !llvm::errorToBool(c.takeError());

				if (payload_len >= (uint64_t(1) << 32))
				return false;

				if (section_id == llvm::wasm::WASM_SEC_CUSTOM) {
				lldb::offset_t prev_offset = c.tell();
				llvm::Optional<ConstString> sect_name = GetWasmString(data, c);
				if (!sect_name)
				return false;

				if (payload_len < c.tell() - prev_offset)
				return false;

				uint32_t section_length = payload_len - (c.tell() - prev_offset);
				m_sect_infos.push_back(section_info{*offset_ptr + c.tell(), section_length,
				section_id, *sect_name});
				*offset_ptr += (c.tell() + section_length);
				} else if (section_id <= llvm::wasm::WASM_SEC_EVENT) {
				m_sect_infos.push_back(section_info{*offset_ptr + c.tell(),
				labathUnsubmitted Done Reply Inline Actions I'd just use StringRef here -- there's no advantage in ConstStringifying this... labath: I'd just use StringRef here -- there's no advantage in ConstStringifying this...
				static_cast<uint32_t>(payload_len),
				section_id, ConstString()});
				*offset_ptr += (c.tell() + payload_len);
				clayborgUnsubmitted Done Reply Inline Actions All these lines can use the GetCStr with a length: ConstString sect_name(data.GetCStr(&offset, name_len)); if (!sect_name) return false; clayborg:* All these lines can use the GetCStr with a length: ``` ConstString sect_name(data.GetCStr…
				paolosevAuthorUnsubmitted Done Reply Inline Actions I had tried to use `DataExtractor::GetCStr()` but it doesn't work because it wants null-terminated strings, and this is not the case in Wasm, where strings are encoded as len (varuint32) followed by an array of len bytes that represent UTF8 chars (https://webassembly.github.io/spec/core/binary/values.html#names). paolosev: I had tried to use `DataExtractor::GetCStr()` but it doesn't work because it wants null…
				} else {
				// Invalid section id.
				return false;
				}
				clayborgUnsubmitted Done Reply Inline Actions remove ConstString constructor here if we switch code above as suggested. clayborg: remove ConstString constructor here if we switch code above as suggested.
				return true;
				}

				bool ObjectFileWasm::DecodeSections() {
				lldb::offset_t offset = kWasmHeaderSize;
				if (IsInMemory()) {
				offset += m_memory_addr;
				}

				while (DecodeNextSection(&offset))
				;
				return true;
				}

				size_t ObjectFileWasm::GetModuleSpecifications(
				const FileSpec &file, DataBufferSP &data_sp, offset_t data_offset,
				labathUnsubmitted Done Reply Inline Actions This seems odd -- I don't think any of our object file plugins work this way. It's normally the symbol vendor who fiddles with the symbol file spec. This is kind of similar to the gnu_debuglink section, and the way that works in elf is that the object file exposes this via a separate method, which the symbol vendor can then query and do the appropriate thing. Maybe you could just drop this part and we can get back to it with the symbol vendor patch? labath: This seems odd -- I don't think any of our object file plugins work this way. It's normally the…
				offset_t file_offset, offset_t length, ModuleSpecList &specs) {
				if (!ValidateModuleHeader(data_sp)) {
				return 0;
				}

				ModuleSpec spec(file, ArchSpec("wasm32-unknown-unknown-wasm"));
				specs.Append(spec);
				return 1;
				}

				labathUnsubmitted Not Done Reply Inline Actions I wouldn't be afraid of presenting `external_debug_info` as an actual section, if that's how it's treated byt the object file format. And it looks like that could simplify this code a bit... labath: I wouldn't be afraid of presenting `external_debug_info` as an actual section, if that's how…
				ObjectFileWasm::ObjectFileWasm(const ModuleSP &module_sp, DataBufferSP &data_sp,
				offset_t data_offset, const FileSpec *file,
				offset_t offset, offset_t length)
				: ObjectFile(module_sp, file, offset, length, data_sp, data_offset),
				m_arch("wasm32-unknown-unknown-wasm"), m_code_section_offset(0) {
				m_data.SetAddressByteSize(4);
				}

				ObjectFileWasm::ObjectFileWasm(const lldb::ModuleSP &module_sp,
				lldb::DataBufferSP &header_data_sp,
				const lldb::ProcessSP &process_sp,
				lldb::addr_t header_addr)
				: ObjectFile(module_sp, process_sp, header_addr, header_data_sp),
				m_arch("wasm32-unknown-unknown-wasm"), m_code_section_offset(0) {}

				bool ObjectFileWasm::ParseHeader() {
				// We already parsed the header during initialization.
				return true;
				}

				Symtab *ObjectFileWasm::GetSymtab() { return nullptr; }

				void ObjectFileWasm::CreateSections(SectionList &unified_section_list) {
				if (m_sections_up)
				return;

				m_sections_up = std::make_unique<SectionList>();
				labathUnsubmitted Done Reply Inline Actions I take it that wasm files don't have anything like a build id, uuid or something similar? labath: I take it that wasm files don't have anything like a build id, uuid or something similar?
				paolosevAuthorUnsubmitted Done Reply Inline Actions Not yet. There is a proposal to add a uuid to wasm modules in a custom section, but it's not part of the standard yet. paolosev: Not yet. There is a proposal to add a uuid to wasm modules in a custom section, but it's not…
				labathUnsubmitted Done Reply Inline Actions Ok. Thanks for clarifying. I think somethink like that will be useful, as otherwise you can't tell if you're using the correct separate debug info file. labath: Ok. Thanks for clarifying. I think somethink like that will be useful, as otherwise you can't…

				if (m_sect_infos.empty()) {
				DecodeSections();
				}

				for (const section_info &sect_info : m_sect_infos) {
				SectionType section_type = eSectionTypeOther;
				ConstString section_name;
				offset_t file_offset = 0;
				addr_t vm_addr = 0;
				size_t vm_size = 0;

				if (llvm::wasm::WASM_SEC_CODE == sect_info.id) {
				section_type = eSectionTypeCode;
				section_name = ConstString("code");
				m_code_section_offset = sect_info.offset & 0xffffffff;
				vm_size = sect_info.size;
				} else {
				section_type =
				llvm::StringSwitch<SectionType>(sect_info.name.GetStringRef())
				.Case(".debug_abbrev", eSectionTypeDWARFDebugAbbrev)
				.Case(".debug_addr", eSectionTypeDWARFDebugAddr)
				.Case(".debug_aranges", eSectionTypeDWARFDebugAranges)
				.Case(".debug_cu_index", eSectionTypeDWARFDebugCuIndex)
				.Case(".debug_frame", eSectionTypeDWARFDebugFrame)
				.Case(".debug_info", eSectionTypeDWARFDebugInfo)
				.Case(".debug_line", eSectionTypeDWARFDebugLine)
				.Case(".debug_line_str", eSectionTypeDWARFDebugLineStr)
				.Case(".debug_loc", eSectionTypeDWARFDebugLoc)
				.Case(".debug_loclists", eSectionTypeDWARFDebugLocLists)
				.Case(".debug_macinfo", eSectionTypeDWARFDebugMacInfo)
				.Case(".debug_macro", eSectionTypeDWARFDebugMacro)
				.Case(".debug_names", eSectionTypeDWARFDebugNames)
				.Case(".debug_pubnames", eSectionTypeDWARFDebugPubNames)
				.Case(".debug_pubtypes", eSectionTypeDWARFDebugPubTypes)
				.Case(".debug_ranges", eSectionTypeDWARFDebugRanges)
				.Case(".debug_rnglists", eSectionTypeDWARFDebugRngLists)
				.Case(".debug_str", eSectionTypeDWARFDebugStr)
				.Case(".debug_str_offsets", eSectionTypeDWARFDebugStrOffsets)
				.Case(".debug_types", eSectionTypeDWARFDebugTypes)
				.Default(eSectionTypeOther);
				if (section_type == eSectionTypeOther)
				continue;
				section_name = sect_info.name;
				file_offset = sect_info.offset & 0xffffffff;
				if (IsInMemory()) {
				vm_addr = sect_info.offset & 0xffffffff;
				vm_size = sect_info.size;
				}
				}

				SectionSP section_sp(
				new Section(GetModule(), // Module to which this section belongs.
				this, // ObjectFile to which this section belongs and
				// should read section data from.
				section_type, // Section ID.
				section_name, // Section name.
				section_type, // Section type.
				vm_addr, // VM address.
				vm_size, // VM size in bytes of this section.
				file_offset, // Offset of this section in the file.
				sect_info.size, // Size of the section as found in the file.
				0, // Alignment of the section
				0, // Flags for this section.
				1)); // Number of host bytes per target byte
				m_sections_up->AddSection(section_sp);
				unified_section_list.AddSection(section_sp);
				}
				}

				bool ObjectFileWasm::SetLoadAddress(Target &target, lldb::addr_t load_address,
				bool value_is_offset) {
				/// In WebAssembly, linear memory is disjointed from code space. The VM can
				/// load multiple instances of a module, which logically share the same code.
				/// We represent a wasm32 code address with 64-bits, like:
				/// 63 32 31 0
				/// +---------------+---------------+
				/// + module_id \| offset \|
				/// +---------------+---------------+
				/// where the lower 32 bits represent a module offset (relative to the module
				/// start not to the beginning of the code section) and the higher 32 bits
				/// uniquely identify the module in the WebAssembly VM.
				/// In other words, we assume that each WebAssembly module is loaded by the
				/// engine at a 64-bit address that starts at the boundary of 4GB pages, like
				/// 0x0000000400000000 for module_id == 4.
				/// These 64-bit addresses will be used to request code ranges for a specific
				/// module from the WebAssembly engine.
				ModuleSP module_sp = GetModule();
				if (!module_sp)
				return false;

				labathUnsubmitted Done Reply Inline Actions Are the debug info sections actually loaded into memory for wasm? Should these be zero (that's what they are for elf)? labath: Are the debug info sections actually loaded into memory for wasm? Should these be zero (that's…
				paolosevAuthorUnsubmitted Done Reply Inline Actions Yes, I was thinking that the debug sections should be loaded into memory; not sure how this works for ELF, how does the debugger find the debug info in that case? paolosev: Yes, I was thinking that the debug sections should be loaded into memory; not sure how this…
				labathUnsubmitted Not Done Reply Inline Actions Are you referring to the load-from-memory, or load-from-file scenario? Normally, the debug info is not loaded into memory, and if lldb is loading the module from a file, then the debug info is loaded from the file (and we use the "file offset" field to locate them). Loading files from memory does not work in general (because we can't find the whole file there -- e.g., section headers are missing). The only case it does work is in case of jitted files. I'm not 100% sure how that works, but given that there is no `if(memory)` block in the section creation code, those sections must too get `vm size = 0`. In practice, I don't think it does not matter that much what you put here -- I expect things will mostly work regardless. I am just trying to make this consistent in some way. If these sections are not found in the memory at the address which you set here (possibly adjusted by the load bias) then I think it makes sense to set `vm_size=vm_addr=0`. If they are indeed present there, then setting it to those values is perfectly fine. labath: Are you referring to the load-from-memory, or load-from-file scenario? Normally, the debug info…
				paolosevAuthorUnsubmitted Done Reply Inline Actions Modified, but I am not sure I completely understood the difference of file addresses and vm addresses. In the case of Wasm, we can have two cases: a) Wasm module contains executable code and also DWARF data embedded in DWARF sections b) Wasm module contains code and has a custom section that points to a separated Wasm module that only contains DWARF sections The file of Wasm modules that contains code should never be loaded directly by LLDB; LLDB should use GDB-remote to connect to the Wasm engine that loaded the module and retrieve the module content from the engine. But when the DWARF data has been stripped into a separate Wasm file, that file should be directly loaded by LLDB in order to load the debug sections. So, if I am not wrong, only in the first case we should assume that the debug info is loaded into memory, and have vm_size != 0? paolosev: Modified, but I am not sure I completely understood the difference of file addresses and vm…
				labathUnsubmitted Done Reply Inline Actions You're right, this is getting pretty confusing, as a lot of the concepts we're talking about here are overloaded (and not even consistently used within lldb). Let me try to elaborate here. I'll start by describing various Section fields (I'll use ELF as a reference): file offset (`m_file_offset`) - where the section is physically located in the file. It does not matter if the file is loaded from inferior memory or not (in the former case, this is an offset from location of the first byte of the file). In elf this corresponds to the `sh_offset` field. file size (`m_file_size`) - size of the section in the file. Some sections don't take up any space in the file (`.bss`), so they have this zero. This is roughly the `sh_size` field in elf (but it is adjusted for SHT_NOBITS sections like .bss) file address (`m_file_addr`) - The address where the object file says this section should be loaded to. Note that this may not be the final address due to ASLR or such. It is the job of the SetLoadAddress function to compute the actual final address (which is then called the "load address"). This corresponds to the `sh_addr` field in elf, but it is also sometimes called the "vm address", because of the `p_vaddr` program header field vm size (`m_byte_size) size of the section when it gets loaded into memory. Sections which don't get loaded into memory have this as 0. This is also "rougly"` sh_size`, but only for SHF_ALLOC sections. All of these fields are really geared for the case where lldb opens a file from disk, and then uses that to understand the contents of process memory. They become pretty redundant if you have loaded the file from memory, but it should still be possible to assign meaningful values to each of them. The file offset+file size combo should reflect the locations of the sections within the file (regardless of whether it's a "real" file or just a chunk of memory). And the file address+vm size combo should reflect the locations of the sections in memory (modulo ASLR, and again independently of where lldb happened to read the file from). Looking at the patch, I think you've got most of these right. The only question is, what is the actual memory layout of a wasm debug info in the inferior process. I originally highlighted this because I was assuming the typical elf model where the debug info is not loaded into memory and the debugger loads it from a file. (jitted files are an afterthought in elf, and not that well supported/tested). However, now I get the impression that the wasm debug info is actually always loaded into memory (assuming it is present, that is), and so setting the vm size to non zero might actually be correct here. You'll need to make the call there, as I don't know how wasm actually works. (Note that you needn't concern yourself much with separate debug files here -- the vm size only really matters once you call SetLoadAddress to actually mark the object file as loaded into the process, and we never call SetLoadAddress on a separate debug file). labath: You're right, this is getting pretty confusing, as a lot of the concepts we're talking about…
				paolosevAuthorUnsubmitted Done Reply Inline Actions Thanks for the clarification! WebAssembly modules are not really loaded into the memory of the inferior process like native executables are. The Wasm/JavaScript engine loads the module and executes it, either interpreting the Wasm bytecode or jit-compiling it into native code. Furthermore, in WebAssembly the code is distinct from the addressable memory. All this poses some interesting problems to enable debugging with LLDB. My idea is that a Wasm engine that wants to support LLDB debugging should implement a GDB remote stub and LLDB would not directly access the inferior memory but only access it by talking with the engine through the remote protocol. From what concerns debug info, by default Clang embeds it in a few custom sections in the Wasm module itself. But normally these debug sections should be stripped into a separate Wasm file (we should modify llvm-objcopy for this) because they are not useful in the engine and take a lot of space. Usually an engine loads and keeps the whole Wasm module into memory, so when the debug info is embedded in the module, we can say that the debug info is loaded into the inferior process but it’s only accessible through the engine. Here SetLoadAddress will be used to specify at which ‘virtual address’ the engine loaded the module; for example a module with id==4 will be loaded at address 0x00000004`00000000 (logic for this will be implemented in a new class DynamicLoaderWasm). LLDB will use gdb-remote commands like ‘m’ to read chunks of the debug sections from a loaded Wasm modules, and the remote stub identifies the module from the id encoded in the address. When the debug info is in a separate file, as you say we don’t call SetLoadAddress on that file, and instead LLDB will use m_file_offset, m_file_size to read the debug directly from the file (m_file_offset, m_file_size can be 0). The code above should be consistent with this logic. paolosev: Thanks for the clarification! WebAssembly modules are not really loaded into the memory of the…
				labathUnsubmitted Not Done Reply Inline Actions Cool, thanks for expaining. I think we've both learned a lot here. labath: Cool, thanks for expaining. I think we've both learned a lot here.
				DecodeSections();

				size_t num_loaded_sections = 0;
				SectionList *section_list = GetSectionList();
				if (!section_list)
				return false;

				const size_t num_sections = section_list->GetSize();
				labathUnsubmitted Done Reply Inline Actions It would be nice to merge these two section-creating blocks.. labath: It would be nice to merge these two section-creating blocks..
				size_t sect_idx = 0;

				for (sect_idx = 0; sect_idx < num_sections; ++sect_idx) {
				SectionSP section_sp(section_list->GetSectionAtIndex(sect_idx));
				if (target.GetSectionLoadList().SetSectionLoadAddress(
				section_sp, load_address \| section_sp->GetFileAddress())) {
				++num_loaded_sections;
				}
				}

				return num_loaded_sections > 0;
				}

				DataExtractor ObjectFileWasm::ReadImageData(uint64_t offset, size_t size) {
				DataExtractor data;
				if (m_file) {
				if (offset < GetByteSize()) {
				size = std::min(size, GetByteSize() - offset);
				auto buffer_sp = MapFileData(m_file, size, offset);
				return DataExtractor(buffer_sp, GetByteOrder(), GetAddressByteSize());
				}
				} else {
				ProcessSP process_sp(m_process_wp.lock());
				if (process_sp) {
				aprantlUnsubmitted Done Reply Inline Actions again.. in the header, or inside the function aprantl: again.. in the header, or inside the function
				auto data_up = std::make_unique<DataBufferHeap>(size, 0);
				Status readmem_error;
				size_t bytes_read = process_sp->ReadMemory(
				offset, data_up->GetBytes(), data_up->GetByteSize(), readmem_error);
				if (bytes_read > 0) {
				DataBufferSP buffer_sp(data_up.release());
				data.SetData(buffer_sp, 0, buffer_sp->GetByteSize());
				labathUnsubmitted Not Done Reply Inline Actions This is strange.. I wouldn't expect that the section decoding logic should depend on the actual address that the object is loaded in memory. Can you explain the reasoning here? labath: This is strange.. I wouldn't expect that the section decoding logic should depend on the actual…
				paolosevAuthorUnsubmitted Done Reply Inline Actions There is a reason for this: DecodeNextSection() calls: ReadImageData(offset, size) and when the debug info sections are embedded in the wasm module (not loaded from a separated symbol file), ReadImageData() calls Process::ReadMemory() that, for a GDB remote connection, goes to ProcessGDBRemote::DoReadMemory(). Here I pass the offset because it also represents the specific id of the module loaded in the target debuggee process, as explained in the comment above. paolosev: There is a reason for this: DecodeNextSection() calls: ```ReadImageData(offset, size)``` and…
				labathUnsubmitted Done Reply Inline Actions What you say about ReadMemory makes sense, but it's not clear to me why you couldn't use the value in `m_memory_addr` for this. For in-memory object file this value should contain the actual address the object file was loaded from (which, I would expect, will include the `module_id` business), and so you wouldn't need the dynamic loader address in order to locate it. I believe this is how the other object file plugins do their in-memory loading.. labath: What you say about ReadMemory makes sense, but it's not clear to me why you couldn't use the…
				paolosevAuthorUnsubmitted Done Reply Inline Actions Good point! Changed. paolosev: Good point! Changed.
				}
				}
				}

				data.SetByteOrder(GetByteOrder());
				return data;
				}

				void ObjectFileWasm::Dump(Stream *s) {
				ModuleSP module_sp(GetModule());
				if (!module_sp)
				return;

				labathUnsubmitted Done Reply Inline Actions Normally I would expect to see `->GetFileAddress()` here, as that's the thing which says how the sections are laid out in memory. The way you create these sections, the two values are the same, but it still seems more correct to call GetFileAddress() labath: Normally I would expect to see `->GetFileAddress()` here, as that's the thing which says how…
				std::lock_guard<std::recursive_mutex> guard(module_sp->GetMutex());

				llvm::raw_ostream &ostream = s->AsRawOstream();
				ostream << static_cast<void *>(this) << ": ";
				s->Indent();
				ostream << "ObjectFileWasm, file = '";
				m_file.Dump(ostream);
				ostream << "', arch = ";
				ostream << GetArchitecture().GetArchitectureName() << "\n";

				SectionList *sections = GetSectionList();
				if (sections) {
				sections->Dump(s, nullptr, true, UINT32_MAX);
				}
				ostream << "\n";
				DumpSectionHeaders(ostream);
				ostream << "\n";
				}

				void ObjectFileWasm::DumpSectionHeader(llvm::raw_ostream &ostream,
				const section_info_t &sh) {
				ostream << llvm::left_justify(sh.name.GetStringRef(), 16) << " "
				<< llvm::format_hex(sh.offset, 10) << " "
				<< llvm::format_hex(sh.size, 10) << " " << llvm::format_hex(sh.id, 6)
				<< "\n";
				}

				void ObjectFileWasm::DumpSectionHeaders(llvm::raw_ostream &ostream) {
				ostream << "Section Headers\n";
				ostream << "IDX name addr size id\n";
				ostream << "==== ---------------- ---------- ---------- ------\n";

				uint32_t idx = 0;
				for (auto pos = m_sect_infos.begin(); pos != m_sect_infos.end();
				++pos, ++idx) {
				ostream << "[" << llvm::format_decimal(idx, 2) << "] ";
				ObjectFileWasm::DumpSectionHeader(ostream, *pos);
				}
				}
				aprantlUnsubmitted Done Reply Inline Actions ditto aprantl: ditto

lldb/source/Utility/ArchSpec.cpp

Show First 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	static const CoreDefinition g_core_definitions[] = {
"thumbv7m"},		"thumbv7m"},
{eByteOrderLittle, 4, 2, 4, llvm::Triple::thumb, ArchSpec::eCore_thumbv7em,		{eByteOrderLittle, 4, 2, 4, llvm::Triple::thumb, ArchSpec::eCore_thumbv7em,
"thumbv7em"},		"thumbv7em"},
{eByteOrderLittle, 8, 4, 4, llvm::Triple::aarch64,		{eByteOrderLittle, 8, 4, 4, llvm::Triple::aarch64,
ArchSpec::eCore_arm_arm64, "arm64"},		ArchSpec::eCore_arm_arm64, "arm64"},
{eByteOrderLittle, 8, 4, 4, llvm::Triple::aarch64,		{eByteOrderLittle, 8, 4, 4, llvm::Triple::aarch64,
ArchSpec::eCore_arm_armv8, "armv8"},		ArchSpec::eCore_arm_armv8, "armv8"},
{eByteOrderLittle, 4, 2, 4, llvm::Triple::arm,		{eByteOrderLittle, 4, 2, 4, llvm::Triple::arm,
ArchSpec::eCore_arm_armv8l, "armv8l"},		ArchSpec::eCore_arm_armv8l, "armv8l"},
{eByteOrderLittle, 4, 4, 4, llvm::Triple::aarch64_32,		{eByteOrderLittle, 4, 4, 4, llvm::Triple::aarch64_32,
		sbc100Unsubmitted Done Reply Inline Actions Is this just clang format being greedy? sbc100: Is this just clang format being greedy?
		paolosevAuthorUnsubmitted Done Reply Inline Actions Yes, it was edited by clang format. paolosev: Yes, it was edited by clang format.
		teemperorUnsubmitted Done Reply Inline Actions Can you revert that change? clang-format shouldn't touch these unrelated files if you use git-clang-format or something similar. teemperor: Can you revert that change? clang-format shouldn't touch these unrelated files if you use git…
ArchSpec::eCore_arm_arm64_32, "arm64_32"},		ArchSpec::eCore_arm_arm64_32, "arm64_32"},
{eByteOrderLittle, 8, 4, 4, llvm::Triple::aarch64,		{eByteOrderLittle, 8, 4, 4, llvm::Triple::aarch64,
ArchSpec::eCore_arm_aarch64, "aarch64"},		ArchSpec::eCore_arm_aarch64, "aarch64"},

// mips32, mips32r2, mips32r3, mips32r5, mips32r6		// mips32, mips32r2, mips32r3, mips32r5, mips32r6
{eByteOrderBig, 4, 2, 4, llvm::Triple::mips, ArchSpec::eCore_mips32,		{eByteOrderBig, 4, 2, 4, llvm::Triple::mips, ArchSpec::eCore_mips32,
"mips"},		"mips"},
{eByteOrderBig, 4, 2, 4, llvm::Triple::mips, ArchSpec::eCore_mips32r2,		{eByteOrderBig, 4, 2, 4, llvm::Triple::mips, ArchSpec::eCore_mips32r2,
"mipsr2"},		"mipsr2"},
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	static const CoreDefinition g_core_definitions[] = {
ArchSpec::eCore_hexagon_hexagonv4, "hexagonv4"},		ArchSpec::eCore_hexagon_hexagonv4, "hexagonv4"},
{eByteOrderLittle, 4, 4, 4, llvm::Triple::hexagon,		{eByteOrderLittle, 4, 4, 4, llvm::Triple::hexagon,
ArchSpec::eCore_hexagon_hexagonv5, "hexagonv5"},		ArchSpec::eCore_hexagon_hexagonv5, "hexagonv5"},

{eByteOrderLittle, 4, 4, 4, llvm::Triple::UnknownArch,		{eByteOrderLittle, 4, 4, 4, llvm::Triple::UnknownArch,
ArchSpec::eCore_uknownMach32, "unknown-mach-32"},		ArchSpec::eCore_uknownMach32, "unknown-mach-32"},
{eByteOrderLittle, 8, 4, 4, llvm::Triple::UnknownArch,		{eByteOrderLittle, 8, 4, 4, llvm::Triple::UnknownArch,
ArchSpec::eCore_uknownMach64, "unknown-mach-64"},		ArchSpec::eCore_uknownMach64, "unknown-mach-64"},
{eByteOrderLittle, 4, 2, 4, llvm::Triple::arc, ArchSpec::eCore_arc, "arc"}		{eByteOrderLittle, 4, 2, 4, llvm::Triple::arc, ArchSpec::eCore_arc, "arc"},

		{eByteOrderLittle, 4, 1, 4, llvm::Triple::wasm32, ArchSpec::eCore_wasm32,
		"wasm32"},
};		};

// Ensure that we have an entry in the g_core_definitions for each core. If you		// Ensure that we have an entry in the g_core_definitions for each core. If you
// comment out an entry above, you will need to comment out the corresponding		// comment out an entry above, you will need to comment out the corresponding
		labathUnsubmitted Done Reply Inline Actions add a trailing comma to avoid subsequent needs to touch this line when adding new entries.. labath: add a trailing comma to avoid subsequent needs to touch this line when adding new entries..
// ArchSpec::Core enumeration.		// ArchSpec::Core enumeration.
static_assert(sizeof(g_core_definitions) / sizeof(CoreDefinition) ==		static_assert(sizeof(g_core_definitions) / sizeof(CoreDefinition) ==
ArchSpec::kNumCores,		ArchSpec::kNumCores,
"make sure we have one core definition for each core");		"make sure we have one core definition for each core");

struct ArchDefinitionEntry {		struct ArchDefinitionEntry {
ArchSpec::Core core;		ArchSpec::Core core;
uint32_t cpu;		uint32_t cpu;
▲ Show 20 Lines • Show All 1,231 Lines • Show Last 20 Lines

lldb/test/Shell/ObjectFile/wasm/basic.yaml

This file was added.

				# RUN: yaml2obj %s > %t
				# RUN: lldb-test object-file %t \| FileCheck %s

				# CHECK: Plugin name: wasm
				# CHECK: Architecture: wasm32-unknown-unknown-wasm
				# CHECK: UUID:
				# CHECK: Executable: true
				# CHECK: Stripped: true
				# CHECK: Type: executable
				# CHECK: Strata: user
				# CHECK: Base VM address: 0xa

				# CHECK: Name: code
				# CHECK: Type: code
				# CHECK: VM address: 0x0
				# CHECK: VM size: 56
				# CHECK: File size: 56

				--- !WASM
				FileHeader:
				Version: 0x00000001
				Sections:
				- Type: CODE
				Functions:
				- Index: 0
				Locals:
				- Type: I32
				Count: 6
				Body: 238080808000210141102102200120026B21032003200036020C200328020C2104200328020C2105200420056C210620060F0B

lldb/test/Shell/ObjectFile/wasm/embedded-debug-sections.yaml

This file was added.

				# RUN: yaml2obj %s > %t
				# RUN: lldb-test object-file %t \| FileCheck %s

				# CHECK: Plugin name: wasm
				# CHECK: Architecture: wasm32-unknown-unknown-wasm
				# CHECK: UUID:
				# CHECK: Executable: true
				# CHECK: Stripped: true
				# CHECK: Type: executable
				# CHECK: Strata: user
				# CHECK: Base VM address: 0xa

				# CHECK: Name: code
				# CHECK: Type: code
				# CHECK: VM address: 0x0
				# CHECK: VM size: 56
				# CHECK: File size: 56

				# CHECK: Name: .debug_info
				# CHECK: Type: dwarf-info
				# CHECK: VM address: 0x0
				# CHECK: VM size: 0
				# CHECK: File size: 2

				# CHECK: Name: .debug_abbrev
				# CHECK: Type: dwarf-abbrev
				# CHECK: VM address: 0x0
				# CHECK: VM size: 0
				# CHECK: File size: 2

				# CHECK: Name: .debug_line
				# CHECK: Type: dwarf-line
				# CHECK: VM address: 0x0
				# CHECK: VM size: 0
				# CHECK: File size: 2

				# CHECK: Name: .debug_str
				# CHECK: Type: dwarf-str
				# CHECK: VM address: 0x0
				# CHECK: VM size: 0
				# CHECK: File size: 3

				--- !WASM
				FileHeader:
				Version: 0x00000001
				Sections:

				- Type: CODE
				Functions:
				- Index: 0
				Locals:
				- Type: I32
				Count: 6
				Body: 238080808000210141102102200120026B21032003200036020C200328020C2104200328020C2105200420056C210620060F0B
				- Type: CUSTOM
				Name: .debug_info
				Payload: 4C00
				- Type: CUSTOM
				Name: .debug_abbrev
				Payload: 0111
				- Type: CUSTOM
				Name: .debug_line
				Payload: 5100
				- Type: CUSTOM
				Name: .debug_str
				Payload: 636CFF
				...

lldb/test/Shell/ObjectFile/wasm/stripped-debug-sections.yaml

This file was added.

				# RUN: yaml2obj %s > %t
				# RUN: lldb-test object-file %t \| FileCheck %s

				# CHECK: Plugin name: wasm
				# CHECK: Architecture: wasm32-unknown-unknown-wasm
				# CHECK: UUID:
				# CHECK: Executable: true
				# CHECK: Stripped: true
				# CHECK: Type: executable
				# CHECK: Strata: user
				# CHECK: Base VM address: 0x0

				# CHECK: Name: .debug_info
				# CHECK: Type: dwarf-info
				# CHECK: VM address: 0x0
				# CHECK: VM size: 0
				# CHECK: File size: 2

				# CHECK: Name: .debug_abbrev
				# CHECK: Type: dwarf-abbrev
				# CHECK: VM address: 0x0
				# CHECK: VM size: 0
				# CHECK: File size: 2

				# CHECK: Name: .debug_line
				# CHECK: Type: dwarf-line
				# CHECK: VM address: 0x0
				# CHECK: VM size: 0
				# CHECK: File size: 2

				# CHECK: Name: .debug_str
				# CHECK: Type: dwarf-str
				# CHECK: VM address: 0x0
				# CHECK: VM size: 0
				# CHECK: File size: 3

				--- !WASM
				FileHeader:
				Version: 0x00000001
				Sections:

				- Type: CUSTOM
				Name: .debug_info
				Payload: 4C00
				- Type: CUSTOM
				Name: .debug_abbrev
				Payload: 0111
				- Type: CUSTOM
				Name: .debug_line
				Payload: 5100
				- Type: CUSTOM
				Name: .debug_str
				Payload: 636CFF
				...

lldb/tools/lldb-test/SystemInitializerTest.cpp

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
#include "Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCRuntimeV1.h"		#include "Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCRuntimeV1.h"
#include "Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCRuntimeV2.h"		#include "Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCRuntimeV2.h"
#include "Plugins/LanguageRuntime/RenderScript/RenderScriptRuntime/RenderScriptRuntime.h"		#include "Plugins/LanguageRuntime/RenderScript/RenderScriptRuntime/RenderScriptRuntime.h"
#include "Plugins/MemoryHistory/asan/MemoryHistoryASan.h"		#include "Plugins/MemoryHistory/asan/MemoryHistoryASan.h"
#include "Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.h"		#include "Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.h"
#include "Plugins/ObjectFile/ELF/ObjectFileELF.h"		#include "Plugins/ObjectFile/ELF/ObjectFileELF.h"
#include "Plugins/ObjectFile/Mach-O/ObjectFileMachO.h"		#include "Plugins/ObjectFile/Mach-O/ObjectFileMachO.h"
#include "Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.h"		#include "Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.h"
		#include "Plugins/ObjectFile/wasm/ObjectFileWasm.h"
#include "Plugins/Platform/Android/PlatformAndroid.h"		#include "Plugins/Platform/Android/PlatformAndroid.h"
#include "Plugins/Platform/FreeBSD/PlatformFreeBSD.h"		#include "Plugins/Platform/FreeBSD/PlatformFreeBSD.h"
#include "Plugins/Platform/Linux/PlatformLinux.h"		#include "Plugins/Platform/Linux/PlatformLinux.h"
#include "Plugins/Platform/MacOSX/PlatformMacOSX.h"		#include "Plugins/Platform/MacOSX/PlatformMacOSX.h"
#include "Plugins/Platform/MacOSX/PlatformRemoteiOS.h"		#include "Plugins/Platform/MacOSX/PlatformRemoteiOS.h"
#include "Plugins/Platform/NetBSD/PlatformNetBSD.h"		#include "Plugins/Platform/NetBSD/PlatformNetBSD.h"
#include "Plugins/Platform/OpenBSD/PlatformOpenBSD.h"		#include "Plugins/Platform/OpenBSD/PlatformOpenBSD.h"
#include "Plugins/Platform/Windows/PlatformWindows.h"		#include "Plugins/Platform/Windows/PlatformWindows.h"
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
llvm::Error SystemInitializerTest::Initialize() {		llvm::Error SystemInitializerTest::Initialize() {
if (auto e = SystemInitializerCommon::Initialize())		if (auto e = SystemInitializerCommon::Initialize())
return e;		return e;

breakpad::ObjectFileBreakpad::Initialize();		breakpad::ObjectFileBreakpad::Initialize();
ObjectFileELF::Initialize();		ObjectFileELF::Initialize();
ObjectFileMachO::Initialize();		ObjectFileMachO::Initialize();
ObjectFilePECOFF::Initialize();		ObjectFilePECOFF::Initialize();
		wasm::ObjectFileWasm::Initialize();

ScriptInterpreterNone::Initialize();		ScriptInterpreterNone::Initialize();


platform_freebsd::PlatformFreeBSD::Initialize();		platform_freebsd::PlatformFreeBSD::Initialize();
platform_linux::PlatformLinux::Initialize();		platform_linux::PlatformLinux::Initialize();
platform_netbsd::PlatformNetBSD::Initialize();		platform_netbsd::PlatformNetBSD::Initialize();
platform_openbsd::PlatformOpenBSD::Initialize();		platform_openbsd::PlatformOpenBSD::Initialize();
▲ Show 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	#if defined(__APPLE__)
PlatformiOSSimulator::Terminate();		PlatformiOSSimulator::Terminate();
PlatformDarwinKernel::Terminate();		PlatformDarwinKernel::Terminate();
#endif		#endif

breakpad::ObjectFileBreakpad::Terminate();		breakpad::ObjectFileBreakpad::Terminate();
ObjectFileELF::Terminate();		ObjectFileELF::Terminate();
ObjectFileMachO::Terminate();		ObjectFileMachO::Terminate();
ObjectFilePECOFF::Terminate();		ObjectFilePECOFF::Terminate();
		wasm::ObjectFileWasm::Terminate();

// Now shutdown the common parts, in reverse order.		// Now shutdown the common parts, in reverse order.
SystemInitializerCommon::Terminate();		SystemInitializerCommon::Terminate();
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[LLDB] Add ObjectFileWasm plugin for WebAssembly debuggingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 238168

lldb/include/lldb/Utility/ArchSpec.h

lldb/source/API/SystemInitializerFull.cpp

lldb/source/Plugins/ObjectFile/CMakeLists.txt

lldb/source/Plugins/ObjectFile/wasm/CMakeLists.txt

lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.h

lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp

lldb/source/Utility/ArchSpec.cpp

lldb/test/Shell/ObjectFile/wasm/basic.yaml

lldb/test/Shell/ObjectFile/wasm/embedded-debug-sections.yaml

lldb/test/Shell/ObjectFile/wasm/stripped-debug-sections.yaml

lldb/tools/lldb-test/SystemInitializerTest.cpp

[LLDB] Add ObjectFileWasm plugin for WebAssembly debugging
ClosedPublic