This is an archive of the discontinued LLVM Phabricator instance.

This will work if the min and max opcode byte size are the same, like for arm64, the min and max are 4. This won't work for x86 or arm32 in thumb mode. So when backing up, we need to do an address lookup on the address we think we want to go to, and then adjust the starting address accordingly. Something like:

SBAddress start_sbaddr = (base_addr - bytes_offset, g_vsc.target);

now we have a section offset address that can tell us more about what it is. We can find the SBFunction or SBSymbol for this address and use those to find the right instructions. This will allow us to correctly disassemble code bytes.

We can also look at the section that the memory comes from and see what the section contains. If the section is data, then emit something like:

0x00001000 .byte 0x23
0x00001001 .byte 0x34
...

To find the section type we can do:

SBSection section = start_sbaddr.GetSection();
if (section.IsValid() && section.GetSectionType() == lldb::eSectionTypeCode) {
 // Disassemble from a valid boundary
} else {
  // Emit a byte or long at a time with ".byte 0xXX" or other ASM directive for binary data
}

We need to ensure we start disassembling on the correct instruction boundary as well as our math for "start_addr" might be in between actual opcode boundaries. If we are in a lldb::eSectionTypeCode, then we know we have instructions, and if we are not, then we can emit ".byte" or other binary data directives. So if we do have lldb::eSectionTypeCode as our section type, then we should have a function or symbol, and we can get instructions from those objects easily:

if (section.IsValid() && section.GetSectionType() == lldb::eSectionTypeCode) {
 lldb::SBInstructionList instructions;
 lldb::SBFunction function = start_sbaddr.GetFunction();
 if (function.IsValid()) {
    instructions = function.GetInstructions(g_vsc.target);
 } else {
    symbol = start_sbaddr.GetSymbol();
    if (symbol.IsValid())
      instructions = symbol.GetInstructions(g_vsc.target);
}
const size_t num_instrs = instructions.GetSize();
if (num_instrs > 0) {
  // we found instructions from a function or symbol and we need to 
  // find the matching instruction that we want to start from by iterating
  // over the instructions and finding the right address
  size_t matching_idx = num_instrs; // Invalid index
  for (size_t i=0; i<num_instrs; ++i) {
    lldb::SBInstruction inst = instructions.GetInstructionAtIndex(i);
    if (inst.GetAddress().GetLoadAddress(g_vsc.target) >= start_addr) {
      matching_idx = i;
      break;
    }
  }
  if (matching_idx < num_instrs) {
    // Now we can print the instructions from [matching_idx, num_instrs)
    // then we need to repeat the search for the next function or symbol. 
    // note there may be bytes between functions or symbols which we can disassemble
    // by calling _get_instructions_from_memory(...) but we must find the next
    // symbol or function boundary and get back on track
  }

2196

Remove any and all printf, or fprintf statements. You can't print anything to stderr or stdout as this is where the DAP packets are get emitted to. We do make it so this won't affect lldb-vscode by doing some magic with the STDOUT/STDERR file handles, but this output will be sent to /dev/null most likely. You can print something to a console (using "g_vsc.SendOutput(...)" is one way).

2246

This revision now requires changes to proceed.Jan 4 2023, 6:01 PM

eloparco added inline comments.Jan 5 2023, 2:32 AM

lldb/tools/lldb-vscode/lldb-vscode.cpp
2179	Sorry, I should have provided a proper explanation. I use the maximum instruction size as the "worst case". Basically, I need to read a portion of memory but I do not know the start address and the size. For the start address, if I want to read N instructions before `base_addr` I need to read at least starting from `base_addr - N * max_instruction_size`: if all instructions are of size `max_instruction_size` I will read exactly N instructions; otherwise I will read more than N instructions and prune the additional ones afterwards. Same for applies for the size. Since `start_addr` is based on a "worst case", it may be an address in the middle of an instruction. In that case, that first instruction will be misinterpreted, but I think that is negligible. The logic is similar to what is implemented in other VS Code extensions, like https://github.com/vadimcn/vscode-lldb/blob/master/adapter/src/debug_session.rs#L1134. Does it make sense?

eloparco added inline comments.Jan 5 2023, 2:50 AM

lldb/tools/lldb-vscode/lldb-vscode.cpp
2196	I suppose I have to replace `llvm::errs()` too, right?

Remove printf usage

eloparco marked 2 inline comments as done.Jan 5 2023, 3:07 AM

Harbormaster completed remote builds in B205871: Diff 486514.Jan 5 2023, 3:09 AM

Add integration tests for disassemble request

Harbormaster completed remote builds in B205894: Diff 486554.Jan 5 2023, 6:45 AM

clayborg added inline comments.Jan 5 2023, 3:37 PM

lldb/tools/lldb-vscode/lldb-vscode.cpp
2179	The issue is, you might end up backing up by N bytes, and you might not end up on an opcode boundary. Lets say you have x86 disassembly like: 0x100002e37 <+183>: 48 8b 4d f8 movq -0x8(%rbp), %rcx 0x100002e3b <+187>: 48 39 c8 cmpq %rcx, %rax 0x100002e3e <+190>: 0f 85 09 00 00 00 jne 0x100002e4d ; <+205> at main.cpp 0x100002e44 <+196>: 8b 45 94 movl -0x6c(%rbp), %eax 0x100002e47 <+199>: 48 83 c4 70 addq $0x70, %rsp 0x100002e4b <+203>: 5d popq %rbp 0x100002e4c <+204>: c3 retq 0x100002e4d <+205>: e8 7e 0f 00 00 callq 0x100003dd0 ; symbol stub for: __stack_chk_fail 0x100002e52 <+210>: 0f 0b ud2 Let's say you started with the address 0x100002e4c, and backed up by the max opcode size of 15 for x86_64, that would be 0x100002e3d. You would start disassembling on a non opcode boundary as this is the last byte of the 3 byte opcode at 0x100002e3b (0xc8). And this would disassembly incorrectly. So we need to make use of the functions or symbol boundaries to ensure we are disassembling correctly. If we have no function or symbol, we can do our best. But as you can see we would get things completely wrong in this case and we need to fix this as detailed.
2196	yes! the main issue is, will the user expect to see this output in the console and will it make sense to the user. I don't know what the user will think if they see "current line not found in disassembled instructions" in the debug console. That goes for all output to the console. It will have to make sense to the user. I don't know if the user will care and or be able to do anything about this message. It also isn't prefixed with a "warning:" or "error:". I would vote to remove it.

eloparco added inline comments.Jan 6 2023, 1:50 AM

lldb/tools/lldb-vscode/lldb-vscode.cpp
2179	Actually, I was mislead by the fact that so far I tried on both: my ARM machine, with instructions of fixed size (4) with WAMR disassembling to WASM, where, when the start address is in the middle of an instruction, only that first instruction is misinterpreted Without going into sections and symbols as you were proposing, the solution can be as easy as done in this VS Code Extension: https://github.com/vadimcn/vscode-lldb/commit/28ae4f4bf3bd29a0c84dd586cd5360836210ab51. I'll update the code and put some additional comments to clarify the logic. Let me know what you think.

eloparco added inline comments.Jan 6 2023, 1:52 AM

lldb/tools/lldb-vscode/lldb-vscode.cpp
2196	Right, it is just a print to debug, is there anything else I can use for that purpose? Otherwise I'll just get rid of it as you were saying.

eloparco added inline comments.Jan 6 2023, 2:05 AM

lldb/tools/lldb-vscode/lldb-vscode.cpp
2196	Probably better to remove it rather than prefixing the message with "debug"

Fix disassemble on variable-length instructions and disassemble with positive offset

Harbormaster completed remote builds in B206062: Diff 486794.Jan 6 2023, 3:16 AM

I haven't followed the lldb-vscode codebase closely enough to have useful comments on that part of the code.

But so far as I can tell, you didn't add a test for SBTarget::GetMaximumOpcodeByteSize, so you should add that before this goes in. Maybe it was folded into the lldb-vscode tests and I missed it, but even so this should have a direct test; I don't think we should require people to build lldb-vscode to test general features of lldb.

This revision now requires changes to proceed.Jan 10 2023, 3:43 PM

Add binding and test for SBTarget::GetMaximumOpcodeByteSize()

@jingham I added the test that was missing

Harbormaster completed remote builds in B207017: Diff 488117.Jan 11 2023, 2:08 AM

eloparco added a reviewer: labath.Jan 11 2023, 2:08 AM

That looks fine. I'm removing my objection, but that's just to the SB API parts, I'm not commenting on the vscode part.

Are you planning on updating the reverse disassembly code still on this patch?

lldb/include/lldb/API/SBTarget.h
844	Do we still need this API change with the new approach? We can't really use this unless we include GetMinimumOpcodeByteSize() and only use this value to backup if the min and max are the same.
lldb/tools/lldb-vscode/lldb-vscode.cpp
2185–2196	Use C++ comments instead of C style comments.

In D140358#4045210, @clayborg wrote:

Are you planning on updating the reverse disassembly code still on this patch?

What do you mean by "reverse disassembly"?

In D140358#4045442, @eloparco wrote:

In D140358#4045210, @clayborg wrote:

Are you planning on updating the reverse disassembly code still on this patch?

What do you mean by "reverse disassembly"?

I mean when we try and back up by some offset.

eloparco added inline comments.Jan 11 2023, 3:50 PM

lldb/include/lldb/API/SBTarget.h
844	In principle we could assume a maximum opcode of size 16 in the calculation, but knowing the maximum actually supported by the architecture can save us from reading unnecessary chunks of memory. In this way, we avoid reading instructions that will then be discarded because resulting in more than `instructionCount` total instructions.

In D140358#4045450, @clayborg wrote:

In D140358#4045442, @eloparco wrote:

In D140358#4045210, @clayborg wrote:

Are you planning on updating the reverse disassembly code still on this patch?

What do you mean by "reverse disassembly"?

I mean when we try and back up by some offset.

Sorry, still not clear. So probably not intended for this patch :)

Use single-line comments

eloparco marked an inline comment as done.Jan 11 2023, 4:05 PM

Harbormaster completed remote builds in B207236: Diff 488415.Jan 11 2023, 4:08 PM

In D140358#4045469, @eloparco wrote:

In D140358#4045450, @clayborg wrote:

In D140358#4045442, @eloparco wrote:

In D140358#4045210, @clayborg wrote:

Are you planning on updating the reverse disassembly code still on this patch?

What do you mean by "reverse disassembly"?

I mean when we try and back up by some offset.

Sorry, still not clear. So probably not intended for this patch :)

If you mean setting breakpoints and stepping through disassembled instructions, that's outside the purpose of this initial patch

I mean we can not just subtract something, any number, from any address unless we have fixed size opcodes. If we do this for x86, you can get complete garbage with no hope of ever getting back on track and this disassembly just won't make sense at all and will be useless. I thought my x86 example spelled out why it is bad to backup. If we start disassembling in the middle of an opcode, we can attempt to disassemble immediate values that are encoded into the middle of an opcode. Since x86 instructions can be 1 byte to 15 bytes, we might disassembly garbage and never align up to real opcodes.

So I see a few solutions:

use functions and symbols to get actual boundaries, and use sections to detect when there are instruction (code) and not
check if min and max opcode sizes are the same and only try to backup if they are the same, but still use sections to disassembly data as .byte or .long directives

eloparco added inline comments.Jan 11 2023, 4:46 PM

lldb/tools/lldb-vscode/lldb-vscode.cpp
2179	Actually, I was mislead by the fact that so far I tried on both: my ARM machine, with instructions of fixed size (4) with WAMR disassembling to WASM, where, when the start address is in the middle of an instruction, only that first instruction is misinterpreted Without going into sections and symbols as you were proposing, the solution can be as easy as done in this VS Code Extension: https://github.com/vadimcn/vscode-lldb/commit/28ae4f4bf3bd29a0c84dd586cd5360836210ab51. I'll update the code and put some additional comments to clarify the logic. Let me know what you think.
2179	@clayborg what I wrote in the comment before this one is what I implemented (Diff 6), do you think that is not enough? I'll try it on a x86 linux machine to make sure it works there

clayborg added inline comments.Jan 11 2023, 5:13 PM

lldb/tools/lldb-vscode/lldb-vscode.cpp
2179	Just checked out your changes, and you are still just subtracting a value from the start address and attempting to disassemble from memory which is the problem. We need to take that subtracted address, and look it up as suggested in previous code examples I posted. If you find a function to symbol, ask those objects for their instructions. and then try to use those. But basically for _any_ disassembly this is what I would recommend doing: first resolve the "start_address" (no matter how you come up the address) that want to disassemble into a SBAddress check its section. If the section is valid and contains instructions, call a function that will disassemble the address range for the section that starts at "start_address" and ends at the end of the section. We can call this "disassemble_code" as a function. More details on this below If the section does not contain instructions, just read the bytes and emit a lines like: 0x1000 .byte 0x12 0x1000 .byte 0x34 ... Now for the disassemble_code function. We know the address range for this is in code. We then need to resolve the address passed to "disassemble_code" into a SBAddress and ask that address for a SBFunction or SBSymbol as I mentioned. Then we ask the SBFunction or SBSymbol for all instructions that they contain, and then use any instructions that fall into the range we have. If there is no SBFunction or SBSymbol, then disassemble an instruction at a time and then see if the new address will resolve to a function or symbol.

eloparco added inline comments.Jan 15 2023, 4:20 PM

lldb/tools/lldb-vscode/lldb-vscode.cpp
2179	Tried my changes on a linux x86 machine and the loop `for (unsigned i = 0; i < max_instruction_size; i++) {` (L2190) takes care of the `start_address` possibly being in the middle of an instruction, so that's not a problem. The problem I faced is that it tries to read too far from `base_addr` and the `ReadMemory()` operation returns few instructions (without reaching `base_addr`). That was not happening on my macOS M1 (arm) machine. To solve, I changed the loop at L2190 to for (unsigned i = 0; i < bytes_offset; i++) { auto sb_instructions = _get_instructions_from_memory(start_addr + i, disassemble_bytes); and if `start_addr` is in `sb_instructions` we're done and can exit the loop. That worked. Another similar thing that can be done is to start from `start_sbaddr` as you were saying, increment the address until a valid section is found. Then call `_get_instructions_from_memory()` passing the section start. What do you think? Delegating the disassembling to `ReadMemory()` + `GetInstructions()` looks simpler to me than to manually iterate over sections and get instructions from symbols and functions. Is there any shortcoming I'm not seeing?

clayborg added inline comments.Jan 16 2023, 1:34 PM

lldb/tools/lldb-vscode/lldb-vscode.cpp
2179	so your for (unsigned i = 0; i < max_instruction_size; i++) { disassembles and tries to make sure you make it back to the original base address from the original disassemble packet? That can work but could a a bit time consuming? The main issue, as you saw on x86, is you don't know what is in memory. You could have unreadable areas of memory when trying to disassemble. Also if you do have good memory that does contain instructions, there can be padding in between functions or even data between functions that the function accesses that can't be correctly disassembled and could throw things off again. The memory regions are the safest way to traverse memory to see what you have and would help you deal with holes in the memory. You can ask about a memory region with: lldb::SBError SBProcess::GetMemoryRegionInfo(lldb::addr_t load_addr, lldb::SBMemoryRegionInfo &region_info); If you ask about an invalid address, like zero on most platforms, it will return a valid "region_info" with the bounds of the unreadable address range filled in no read/write/execute permissions: (lldb) script Python Interactive Interpreter. To exit, type 'quit()', 'exit()' or Ctrl-D. >>> region = lldb.SBMemoryRegionInfo() >>> err = lldb.process.GetMemoryRegionInfo(0, region) [0x0000000000000000-0x0000000100000000 ---] So you could use the memory region info to your advantage here. If you have execute permissions, disassemble as instructions, and if you don't emit ".byte 0xXX" for each byte. If there are no permissions, you can emit some other string like "0x00000000: <invalid memory>". That being said, even when you do find an executable section of memory, there can be different stuff there even if a section _is_ executable. For instance, if we ask about the next memory region on a mac M1: >>> err = lldb.process.GetMemoryRegionInfo(0x0000000100000000, region) >>> print(region) [0x0000000100000000-0x0000000100004000 R-X] Notice that this is read + execute (you can access these via: region.IsReadable() region.IsWritable() region.IsExecutable() But at this address, this is the mach-o header which doesn't make sense to try and disassemble: (lldb) memory read 0x0000000100000000 0x100000000: cf fa ed fe 0c 00 00 01 00 00 00 00 02 00 00 00 ................ 0x100000010: 11 00 00 00 18 03 00 00 85 00 20 00 00 00 00 00 .......... ..... (lldb) memory read -fx -s4 -c 4 0x0000000100000000 0x100000000: 0xfeedfacf 0x0100000c 0x00000000 0x00000002 0xfeedfacf is the mach-o magic bytes for little endian 64 bit mach-o files. (lldb) disassemble --start-address 0x0000000100000000 a.out`_mh_execute_header: 0x100000000 <+0>: .long 0xfeedfacf ; unknown opcode 0x100000004 <+4>: .long 0x0100000c ; unknown opcode 0x100000008 <+8>: udf #0x0 0x10000000c <+12>: udf #0x2 0x100000010 <+16>: udf #0x11 0x100000014 <+20>: udf #0x318 0x100000018 <+24>: .long 0x00200085 ; unknown opcode 0x10000001c <+28>: udf #0x0 So this is the main reason why I would suggest just disassembling using .byte or .long when we aren't in a function or symbol.

eloparco added inline comments.Jan 17 2023, 1:15 AM

lldb/tools/lldb-vscode/lldb-vscode.cpp
2179	understood, so in that case we also have to do the same for `_handle_disassemble_positive_offset`, because even there, when we use `ReadInstructions()`, we could have some unreadable areas in the middle and `ReadInstructions()` would stop prematurely when running into one of them, resulting in less instructions returned than expected

Revision Contents

Path

Size

lldb/

include/

lldb/

API/

SBTarget.h

2 lines

source/

API/

SBTarget.cpp

10 lines

tools/

lldb-vscode/

CMakeLists.txt

1 line

DisassembledInstruction.h

19 lines

DisassembledInstruction.cpp

27 lines

20 lines

11 lines

22 lines

9 lines

1 line

1 line

197 lines

Diff 484140

lldb/include/lldb/API/SBTarget.h

Show First 20 Lines • Show All 835 Lines • ▼ Show 20 Lines	public:

lldb::SBValue EvaluateExpression(const char *expr);		lldb::SBValue EvaluateExpression(const char *expr);

lldb::SBValue EvaluateExpression(const char *expr,		lldb::SBValue EvaluateExpression(const char *expr,
const SBExpressionOptions &options);		const SBExpressionOptions &options);

lldb::addr_t GetStackRedZoneSize();		lldb::addr_t GetStackRedZoneSize();

		uint32_t GetMaximumOpcodeByteSize() const;
		clayborgUnsubmitted Not Done Reply Inline Actions Do we still need this API change with the new approach? We can't really use this unless we include GetMinimumOpcodeByteSize() and only use this value to backup if the min and max are the same. clayborg: Do we still need this API change with the new approach? We can't really use this unless we…
		eloparcoAuthorUnsubmitted Done Reply Inline Actions In principle we could assume a maximum opcode of size 16 in the calculation, but knowing the maximum actually supported by the architecture can save us from reading unnecessary chunks of memory. In this way, we avoid reading instructions that will then be discarded because resulting in more than `instructionCount` total instructions. eloparco: In principle we could assume a maximum opcode of size 16 in the calculation, but knowing the…

bool IsLoaded(const lldb::SBModule &module) const;		bool IsLoaded(const lldb::SBModule &module) const;

lldb::SBLaunchInfo GetLaunchInfo() const;		lldb::SBLaunchInfo GetLaunchInfo() const;

void SetLaunchInfo(const lldb::SBLaunchInfo &launch_info);		void SetLaunchInfo(const lldb::SBLaunchInfo &launch_info);

/// Get a \a SBTrace object the can manage the processor trace information of		/// Get a \a SBTrace object the can manage the processor trace information of
/// this target.		/// this target.
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

lldb/source/API/SBTarget.cpp

Show First 20 Lines • Show All 1,959 Lines • ▼ Show 20 Lines	if (addr_ptr) {
target_sp->GetArchitecture(), nullptr, flavor_string, *addr_ptr,		target_sp->GetArchitecture(), nullptr, flavor_string, *addr_ptr,
data.GetBytes(), bytes_read, count, data_from_file));		data.GetBytes(), bytes_read, count, data_from_file));
}		}
}		}

return sb_instructions;		return sb_instructions;
}		}

		uint32_t SBTarget::GetMaximumOpcodeByteSize() const {
		LLDB_INSTRUMENT_VA(this);

		TargetSP target_sp(GetSP());
		if (target_sp)
		return target_sp->GetArchitecture().GetMaximumOpcodeByteSize();

		return 0;
		}

lldb::SBInstructionList SBTarget::GetInstructions(lldb::SBAddress base_addr,		lldb::SBInstructionList SBTarget::GetInstructions(lldb::SBAddress base_addr,
const void *buf,		const void *buf,
size_t size) {		size_t size) {
LLDB_INSTRUMENT_VA(this, base_addr, buf, size);		LLDB_INSTRUMENT_VA(this, base_addr, buf, size);

return GetInstructionsWithFlavor(base_addr, nullptr, buf, size);		return GetInstructionsWithFlavor(base_addr, nullptr, buf, size);
}		}

▲ Show 20 Lines • Show All 349 Lines • Show Last 20 Lines

lldb/tools/lldb-vscode/CMakeLists.txt

	Show All 19 Lines
	# not re-export those.			# not re-export those.
	set(LLVM_LINK_COMPONENTS Support)			set(LLVM_LINK_COMPONENTS Support)
	set(LLVM_TARGET_DEFINITIONS Options.td)			set(LLVM_TARGET_DEFINITIONS Options.td)
	tablegen(LLVM Options.inc -gen-opt-parser-defs)			tablegen(LLVM Options.inc -gen-opt-parser-defs)
	add_public_tablegen_target(LLDBVSCodeOptionsTableGen)			add_public_tablegen_target(LLDBVSCodeOptionsTableGen)
	add_lldb_tool(lldb-vscode			add_lldb_tool(lldb-vscode
	lldb-vscode.cpp			lldb-vscode.cpp
	BreakpointBase.cpp			BreakpointBase.cpp
				DisassembledInstruction.cpp
	ExceptionBreakpoint.cpp			ExceptionBreakpoint.cpp
	FifoFiles.cpp			FifoFiles.cpp
	FunctionBreakpoint.cpp			FunctionBreakpoint.cpp
	IOStream.cpp			IOStream.cpp
	JSONUtils.cpp			JSONUtils.cpp
	LLDBUtils.cpp			LLDBUtils.cpp
	OutputRedirector.cpp			OutputRedirector.cpp
	ProgressEvent.cpp			ProgressEvent.cpp
	Show All 25 Lines

lldb/tools/lldb-vscode/DisassembledInstruction.h

This file was added.

				#ifndef LLDB_TOOLS_LLDB_VSCODE_DISASSEMBLED_INSTRUCTION_H
				#define LLDB_TOOLS_LLDB_VSCODE_DISASSEMBLED_INSTRUCTION_H

				#include "VSCodeForward.h"
				#include <string>

				namespace lldb_vscode {

				struct DisassembledInstruction {
				std::string m_address;
				std::string m_instruction;

				DisassembledInstruction();
				DisassembledInstruction(lldb::SBInstruction &inst);
				};

				} // namespace lldb_vscode

				#endif // LLDB_TOOLS_LLDB_VSCODE_DISASSEMBLED_INSTRUCTION_H
				No newline at end of file

lldb/tools/lldb-vscode/DisassembledInstruction.cpp

This file was added.

				#include "DisassembledInstruction.h"

				#include "LLDBUtils.h"
				#include "VSCode.h"
				#include "lldb/API/SBInstruction.h"

				namespace lldb_vscode {

				DisassembledInstruction::DisassembledInstruction()
				: m_address("0x0000000000000000"), m_instruction(" <invalid>") {}

				DisassembledInstruction::DisassembledInstruction(lldb::SBInstruction &inst) {
				const auto inst_addr = inst.GetAddress().GetLoadAddress(g_vsc.target);
				const char *m = inst.GetMnemonic(g_vsc.target);
				const char *o = inst.GetOperands(g_vsc.target);
				const char *c = inst.GetComment(g_vsc.target);

				std::string line;
				llvm::raw_string_ostream line_strm(line);
				const auto comment_sep = (c == nullptr \|\| std::string(c) == "") ? "" : " ; ";
				line_strm << llvm::formatv("{0,12} {1}{2}{3}", m, o, comment_sep, c);

				m_address = addr_to_hex_string(inst_addr);
				m_instruction = line_strm.str();
				}

				} // namespace lldb_vscode
				No newline at end of file

lldb/tools/lldb-vscode/JSONUtils.h

	Show First 20 Lines • Show All 343 Lines • ▼ Show 20 Lines
	///			///
	/// This function will fill in the following keys in the returned			/// This function will fill in the following keys in the returned
	/// object:			/// object:
	/// "id" - the stack frame ID as an integer			/// "id" - the stack frame ID as an integer
	/// "name" - the function name as a string			/// "name" - the function name as a string
	/// "source" - source file information as a "Source" VSCode object			/// "source" - source file information as a "Source" VSCode object
	/// "line" - the source file line number as an integer			/// "line" - the source file line number as an integer
	/// "column" - the source file column number as an integer			/// "column" - the source file column number as an integer
				/// "instructionPointerReference" - a memory reference for the current
				/// instruction pointer in this frame
	///			///
	/// \param[in] frame			/// \param[in] frame
	/// The LLDB stack frame to use when populating out the "StackFrame"			/// The LLDB stack frame to use when populating out the "StackFrame"
	/// object.			/// object.
	///			///
	/// \return			/// \return
	/// A "StackFrame" JSON object with that follows the formal JSON			/// A "StackFrame" JSON object with that follows the formal JSON
	/// definition outlined by Microsoft.			/// definition outlined by Microsoft.
	▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
	///			///
	/// \return			/// \return
	/// A "Variable" JSON object with that follows the formal JSON			/// A "Variable" JSON object with that follows the formal JSON
	/// definition outlined by Microsoft.			/// definition outlined by Microsoft.
	llvm::json::Value CreateVariable(lldb::SBValue v, int64_t variablesReference,			llvm::json::Value CreateVariable(lldb::SBValue v, int64_t variablesReference,
	int64_t varID, bool format_hex,			int64_t varID, bool format_hex,
	bool is_name_duplicated = false);			bool is_name_duplicated = false);

				/// Create a "DisassembledInstruction" object for a LLDB disassembled
				/// instruction object.
				///
				/// This function will fill in the following keys in the returned
				/// object:
				/// "address" - the address of the instruction
				/// "instruction" - the text representing the instruction and its operands
				///
				/// \param[in] instruction
				/// The LLDB disassembled instruction to use when populating out the
				/// "DisassembledInstruction" object.
				///
				/// \return
				/// A "DisassembledInstruction" JSON object with that follows the formal
				/// JSON definition outlined by Microsoft.
				llvm::json::Value
				CreateDisassembledInstruction(DisassembledInstruction instruction);

	llvm::json::Value CreateCompileUnit(lldb::SBCompileUnit unit);			llvm::json::Value CreateCompileUnit(lldb::SBCompileUnit unit);

	/// Create a runInTerminal reverse request object			/// Create a runInTerminal reverse request object
	///			///
	/// \param[in] launch_request			/// \param[in] launch_request
	/// The original launch_request object whose fields are used to construct			/// The original launch_request object whose fields are used to construct
	/// the reverse request object.			/// the reverse request object.
	///			///
	Show All 27 Lines

lldb/tools/lldb-vscode/JSONUtils.cpp

Show First 20 Lines • Show All 776 Lines • ▼ Show 20 Lines	if (disasm_line > 0) {
object.try_emplace("line", disasm_line);		object.try_emplace("line", disasm_line);
} else {		} else {
auto line = line_entry.GetLine();		auto line = line_entry.GetLine();
if (line == UINT32_MAX)		if (line == UINT32_MAX)
line = 0;		line = 0;
object.try_emplace("line", line);		object.try_emplace("line", line);
}		}
object.try_emplace("column", line_entry.GetColumn());		object.try_emplace("column", line_entry.GetColumn());

		auto pc = addr_to_hex_string(frame.GetPC());
		object.try_emplace("instructionPointerReference", pc);
return llvm::json::Value(std::move(object));		return llvm::json::Value(std::move(object));
}		}

// "Thread": {		// "Thread": {
// "type": "object",		// "type": "object",
// "description": "A Thread",		// "description": "A Thread",
// "properties": {		// "properties": {
// "id": {		// "id": {
▲ Show 20 Lines • Show All 299 Lines • ▼ Show 20 Lines	llvm::json::Value CreateCompileUnit(lldb::SBCompileUnit unit) {
llvm::json::Object object;		llvm::json::Object object;
char unit_path_arr[PATH_MAX];		char unit_path_arr[PATH_MAX];
unit.GetFileSpec().GetPath(unit_path_arr, sizeof(unit_path_arr));		unit.GetFileSpec().GetPath(unit_path_arr, sizeof(unit_path_arr));
std::string unit_path(unit_path_arr);		std::string unit_path(unit_path_arr);
object.try_emplace("compileUnitPath", unit_path);		object.try_emplace("compileUnitPath", unit_path);
return llvm::json::Value(std::move(object));		return llvm::json::Value(std::move(object));
}		}

		llvm::json::Value
		CreateDisassembledInstruction(DisassembledInstruction instruction) {
		llvm::json::Object object;
		EmplaceSafeString(object, "address", instruction.m_address);
		EmplaceSafeString(object, "instruction", instruction.m_instruction);
		return llvm::json::Value(std::move(object));
		}

/// See		/// See
/// https://microsoft.github.io/debug-adapter-protocol/specification#Reverse_Requests_RunInTerminal		/// https://microsoft.github.io/debug-adapter-protocol/specification#Reverse_Requests_RunInTerminal
llvm::json::Object		llvm::json::Object
CreateRunInTerminalReverseRequest(const llvm::json::Object &launch_request,		CreateRunInTerminalReverseRequest(const llvm::json::Object &launch_request,
llvm::StringRef debug_adaptor_path,		llvm::StringRef debug_adaptor_path,
llvm::StringRef comm_file) {		llvm::StringRef comm_file) {
llvm::json::Object reverse_request;		llvm::json::Object reverse_request;
reverse_request.try_emplace("type", "request");		reverse_request.try_emplace("type", "request");
▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

lldb/tools/lldb-vscode/LLDBUtils.h

	//===-- LLDBUtils.h ---------------------------------------------- C++ --===//			//===-- LLDBUtils.h ---------------------------------------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLDB_TOOLS_LLDB_VSCODE_LLDBUTILS_H			#ifndef LLDB_TOOLS_LLDB_VSCODE_LLDBUTILS_H
	#define LLDB_TOOLS_LLDB_VSCODE_LLDBUTILS_H			#define LLDB_TOOLS_LLDB_VSCODE_LLDBUTILS_H

	#include "VSCodeForward.h"			#include "VSCodeForward.h"
				#include "lldb/lldb-types.h"
	#include "llvm/ADT/ArrayRef.h"			#include "llvm/ADT/ArrayRef.h"
	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"
	#include <string>			#include <string>
	#include <vector>			#include <vector>

	namespace lldb_vscode {			namespace lldb_vscode {

	▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	///			///
	/// \param[in] dap_frame_id			/// \param[in] dap_frame_id
	/// The VSCode frame ID to convert to a frame ID.			/// The VSCode frame ID to convert to a frame ID.
	///			///
	/// \return			/// \return
	/// The LLDB frame index ID.			/// The LLDB frame index ID.
	uint32_t GetLLDBFrameID(uint64_t dap_frame_id);			uint32_t GetLLDBFrameID(uint64_t dap_frame_id);

				/// Given an address, convert it to its hexadecimal representation.
				///
				/// \param[in] address
				/// The address to convert.
				///
				/// \return
				/// The hexadecimal representation of the address.
				std::string addr_to_hex_string(const lldb::addr_t address);

				/// Given an hexadecimal representation of an address, convert it to a number.
				///
				/// Reverse of `addr_to_hex_string()`.
				///
				/// \param[in] hex_address
				/// The hexadecimal address to convert.
				///
				/// \return
				/// The decimal representation of the hex address.
				lldb::addr_t
				hex_string_to_addr(const std::optional<llvm::StringRef> hex_address);

	} // namespace lldb_vscode			} // namespace lldb_vscode

	#endif			#endif

lldb/tools/lldb-vscode/LLDBUtils.cpp

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	uint32_t GetLLDBFrameID(uint64_t dap_frame_id) {
return dap_frame_id & ((1u << THREAD_INDEX_SHIFT) - 1);		return dap_frame_id & ((1u << THREAD_INDEX_SHIFT) - 1);
}		}

int64_t MakeVSCodeFrameID(lldb::SBFrame &frame) {		int64_t MakeVSCodeFrameID(lldb::SBFrame &frame) {
return (int64_t)(frame.GetThread().GetIndexID() << THREAD_INDEX_SHIFT \|		return (int64_t)(frame.GetThread().GetIndexID() << THREAD_INDEX_SHIFT \|
frame.GetFrameID());		frame.GetFrameID());
}		}

		std::string addr_to_hex_string(const lldb::addr_t address) {
		return "0x" + llvm::utohexstr(address, true);
		}

		lldb::addr_t
		hex_string_to_addr(const std::optional<llvm::StringRef> hex_address) {
		return std::stoull(hex_address->data(), nullptr, 16);
		}

} // namespace lldb_vscode		} // namespace lldb_vscode

lldb/tools/lldb-vscode/VSCode.h

	Show All 40 Lines
	#include "lldb/API/SBLineEntry.h"			#include "lldb/API/SBLineEntry.h"
	#include "lldb/API/SBListener.h"			#include "lldb/API/SBListener.h"
	#include "lldb/API/SBProcess.h"			#include "lldb/API/SBProcess.h"
	#include "lldb/API/SBStream.h"			#include "lldb/API/SBStream.h"
	#include "lldb/API/SBStringList.h"			#include "lldb/API/SBStringList.h"
	#include "lldb/API/SBTarget.h"			#include "lldb/API/SBTarget.h"
	#include "lldb/API/SBThread.h"			#include "lldb/API/SBThread.h"

				#include "DisassembledInstruction.h"
	#include "ExceptionBreakpoint.h"			#include "ExceptionBreakpoint.h"
	#include "FunctionBreakpoint.h"			#include "FunctionBreakpoint.h"
	#include "IOStream.h"			#include "IOStream.h"
	#include "ProgressEvent.h"			#include "ProgressEvent.h"
	#include "RunInTerminal.h"			#include "RunInTerminal.h"
	#include "SourceBreakpoint.h"			#include "SourceBreakpoint.h"
	#include "SourceReference.h"			#include "SourceReference.h"

	▲ Show 20 Lines • Show All 223 Lines • Show Last 20 Lines

lldb/tools/lldb-vscode/VSCodeForward.h

	Show All 9 Lines
	#define LLDB_TOOLS_LLDB_VSCODE_VSCODEFORWARD_H			#define LLDB_TOOLS_LLDB_VSCODE_VSCODEFORWARD_H

	namespace lldb_vscode {			namespace lldb_vscode {
	struct BreakpointBase;			struct BreakpointBase;
	struct ExceptionBreakpoint;			struct ExceptionBreakpoint;
	struct FunctionBreakpoint;			struct FunctionBreakpoint;
	struct SourceBreakpoint;			struct SourceBreakpoint;
	struct SourceReference;			struct SourceReference;
				struct DisassembledInstruction;
	} // namespace lldb_vscode			} // namespace lldb_vscode

	namespace lldb {			namespace lldb {
	class SBAttachInfo;			class SBAttachInfo;
	class SBBreakpoint;			class SBBreakpoint;
	class SBBreakpointLocation;			class SBBreakpointLocation;
	class SBCommandInterpreter;			class SBCommandInterpreter;
	class SBCommandReturnObject;			class SBCommandReturnObject;
	Show All 20 Lines

lldb/tools/lldb-vscode/lldb-vscode.cpp

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines

public: public:

LLDBVSCodeOptTable() : OptTable(InfoTable, true) {} LLDBVSCodeOptTable() : OptTable(InfoTable, true) {}

}; };

typedef void (*RequestCallback)(const llvm::json::Object &command); typedef void (*RequestCallback)(const llvm::json::Object &command);

enum LaunchMethod { Launch, Attach, AttachForSuspendedLaunch }; enum LaunchMethod { Launch, Attach, AttachForSuspendedLaunch };

lldb::SBFrame g_curr_frame;

lldb::SBValueList *GetTopLevelScope(int64_t variablesReference) { lldb::SBValueList *GetTopLevelScope(int64_t variablesReference) {

switch (variablesReference) { switch (variablesReference) {

case VARREF_LOCALS: case VARREF_LOCALS:

return &g_vsc.variables.locals; return &g_vsc.variables.locals;

case VARREF_GLOBALS: case VARREF_GLOBALS:

return &g_vsc.variables.globals; return &g_vsc.variables.globals;

case VARREF_REGS: case VARREF_REGS:

return &g_vsc.variables.registers; return &g_vsc.variables.registers;

▲ Show 20 Lines • Show All 1,333 Lines • ▼ Show 20 Lines void request_initialize(const llvm::json::Object &request) {

}; };

auto arguments = request.getObject("arguments"); auto arguments = request.getObject("arguments");

// sourceInitFile option is not from formal DAP specification. It is only // sourceInitFile option is not from formal DAP specification. It is only

// used by unit tests to prevent sourcing .lldbinit files from environment // used by unit tests to prevent sourcing .lldbinit files from environment

// which may affect the outcome of tests. // which may affect the outcome of tests.

bool source_init_file = GetBoolean(arguments, "sourceInitFile", true); bool source_init_file = GetBoolean(arguments, "sourceInitFile", true);

g_vsc.debugger = g_vsc.debugger = lldb::SBDebugger::Create(source_init_file, log_cb, nullptr);

lldb::SBDebugger::Create(source_init_file, log_cb, nullptr);

g_vsc.progress_event_thread = std::thread(ProgressEventThreadFunction); g_vsc.progress_event_thread = std::thread(ProgressEventThreadFunction);

// Start our event thread so we can receive events from the debugger, target, // Start our event thread so we can receive events from the debugger, target,

// process and more. // process and more.

g_vsc.event_thread = std::thread(EventThreadFunction); g_vsc.event_thread = std::thread(EventThreadFunction);

llvm::json::Object response; llvm::json::Object response;

FillResponse(request, response); FillResponse(request, response);

▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines void request_initialize(const llvm::json::Object &request) {

body.try_emplace("supportsDelayedStackTraceLoading", true); body.try_emplace("supportsDelayedStackTraceLoading", true);

// The debug adapter supports the 'loadedSources' request. // The debug adapter supports the 'loadedSources' request.

body.try_emplace("supportsLoadedSourcesRequest", false); body.try_emplace("supportsLoadedSourcesRequest", false);

// The debug adapter supports sending progress reporting events. // The debug adapter supports sending progress reporting events.

body.try_emplace("supportsProgressReporting", true); body.try_emplace("supportsProgressReporting", true);

// The debug adapter supports 'logMessage' in breakpoint. // The debug adapter supports 'logMessage' in breakpoint.

body.try_emplace("supportsLogPoints", true); body.try_emplace("supportsLogPoints", true);

body.try_emplace("supportsDisassembleRequest", true);

response.try_emplace("body", std::move(body)); response.try_emplace("body", std::move(body));

g_vsc.SendJSON(llvm::json::Value(std::move(response))); g_vsc.SendJSON(llvm::json::Value(std::move(response)));

} }

llvm::Error request_runInTerminal(const llvm::json::Object &launch_request) { llvm::Error request_runInTerminal(const llvm::json::Object &launch_request) {

g_vsc.is_attach = true; g_vsc.is_attach = true;

lldb::SBAttachInfo attach_info; lldb::SBAttachInfo attach_info;

▲ Show 20 Lines • Show All 559 Lines • ▼ Show 20 Lines void request_setBreakpoints(const llvm::json::Object &request) {

} }

llvm::json::Object body; llvm::json::Object body;

body.try_emplace("breakpoints", std::move(response_breakpoints)); body.try_emplace("breakpoints", std::move(response_breakpoints));

response.try_emplace("body", std::move(body)); response.try_emplace("body", std::move(body));

g_vsc.SendJSON(llvm::json::Value(std::move(response))); g_vsc.SendJSON(llvm::json::Value(std::move(response)));

} }

std::vector<lldb::SBInstruction>

_get_instructions_from_memory(lldb::addr_t start, uint64_t count,

lldb::addr_t end) {

lldb::SBProcess process = g_vsc.target.GetProcess();

lldb::SBError error;

std::vector<uint8_t> buffer(count, 0);

const size_t bytes_read __attribute__((unused)) = process.ReadMemory(

start, static_cast<void *>(buffer.data()), count, error);

assert(bytes_read == count && error.Success() &&

"unable to read byte range from memory");

// If base_addr starts in the middle of an instruction,

// that first instruction will not be parsed correctly (negligible)

std::vector<lldb::SBInstruction> sb_instructions;

const auto base_addr = lldb::SBAddress(start, g_vsc.target);

lldb::SBInstructionList instructions =

g_vsc.target.GetInstructions(base_addr, buffer.data(), count);

for (size_t i = 0; i < instructions.GetSize(); i++) {

auto instr = instructions.GetInstructionAtIndex(i);

if (instr.GetAddress().GetLoadAddress(g_vsc.target) > end)

break;

sb_instructions.emplace_back(instr);

}

return sb_instructions;

}

std::pair<lldb::addr_t, lldb::addr_t> _get_frame_boundary() {

assert(g_curr_frame.IsValid());

auto function = g_curr_frame.GetFunction();

if (!function.IsValid())

return std::make_pair<>(LLDB_INVALID_ADDRESS, LLDB_INVALID_ADDRESS);

return std::make_pair<>(

function.GetStartAddress().GetLoadAddress(g_vsc.target),

function.GetEndAddress().GetLoadAddress(g_vsc.target));

}

auto _handle_disassemble_positive_offset(lldb::addr_t base_addr,

int64_t instruction_offset,

uint64_t instruction_count) {

llvm::json::Array response_instructions;

auto start_addr = lldb::SBAddress(base_addr, g_vsc.target);

lldb::SBInstructionList instructions = g_vsc.target.ReadInstructions(

start_addr, instruction_offset + instruction_count);

std::vector<DisassembledInstruction> dis_instructions;

const auto num_instrs_to_skip = static_cast<size_t>(instruction_offset);

for (size_t i = num_instrs_to_skip; i < instructions.GetSize(); ++i) {

lldb::SBInstruction instr = instructions.GetInstructionAtIndex(i);

auto disass_instr =

CreateDisassembledInstruction(DisassembledInstruction(instr));

clayborgUnsubmitted

Not Done

SBAddress start_sbaddr = (base_addr - bytes_offset, g_vsc.target);

We can also look at the section that the memory comes from and see what the section contains. If the section is data, then emit something like:

0x00001000 .byte 0x23
0x00001001 .byte 0x34
...

To find the section type we can do:

SBSection section = start_sbaddr.GetSection();
if (section.IsValid() && section.GetSectionType() == lldb::eSectionTypeCode) {
 // Disassemble from a valid boundary
} else {
  // Emit a byte or long at a time with ".byte 0xXX" or other ASM directive for binary data
}

if (section.IsValid() && section.GetSectionType() == lldb::eSectionTypeCode) {
 lldb::SBInstructionList instructions;
 lldb::SBFunction function = start_sbaddr.GetFunction();
 if (function.IsValid()) {
    instructions = function.GetInstructions(g_vsc.target);
 } else {
    symbol = start_sbaddr.GetSymbol();
    if (symbol.IsValid())
      instructions = symbol.GetInstructions(g_vsc.target);
}
const size_t num_instrs = instructions.GetSize();
if (num_instrs > 0) {
  // we found instructions from a function or symbol and we need to 
  // find the matching instruction that we want to start from by iterating
  // over the instructions and finding the right address
  size_t matching_idx = num_instrs; // Invalid index
  for (size_t i=0; i<num_instrs; ++i) {
    lldb::SBInstruction inst = instructions.GetInstructionAtIndex(i);
    if (inst.GetAddress().GetLoadAddress(g_vsc.target) >= start_addr) {
      matching_idx = i;
      break;
    }
  }
  if (matching_idx < num_instrs) {
    // Now we can print the instructions from [matching_idx, num_instrs)
    // then we need to repeat the search for the next function or symbol. 
    // note there may be bytes between functions or symbols which we can disassemble
    // by calling _get_instructions_from_memory(...) but we must find the next
    // symbol or function boundary and get back on track
  }

clayborg: This will work if the min and max opcode byte size are the same, like for arm64, the min and…

eloparcoAuthorUnsubmitted

Done

Sorry, I should have provided a proper explanation.

I use the maximum instruction size as the "worst case". Basically, I need to read a portion of memory but I do not know the start address and the size. For the start address, if I want to read N instructions before base_addr I need to read at least starting from base_addr - N * max_instruction_size: if all instructions are of size max_instruction_size I will read exactly N instructions; otherwise I will read more than N instructions and prune the additional ones afterwards. Same for applies for the size.

Since start_addr is based on a "worst case", it may be an address in the middle of an instruction. In that case, that first instruction will be misinterpreted, but I think that is negligible.

The logic is similar to what is implemented in other VS Code extensions, like https://github.com/vadimcn/vscode-lldb/blob/master/adapter/src/debug_session.rs#L1134.

Does it make sense?

eloparco: Sorry, I should have provided a proper explanation. I use the maximum instruction size as the…

clayborgUnsubmitted

Not Done

The issue is, you might end up backing up by N bytes, and you might not end up on an opcode boundary. Lets say you have x86 disassembly like:

0x100002e37 <+183>: 48 8b 4d f8              movq   -0x8(%rbp), %rcx
0x100002e3b <+187>: 48 39 c8                 cmpq   %rcx, %rax
0x100002e3e <+190>: 0f 85 09 00 00 00        jne    0x100002e4d               ; <+205> at main.cpp
0x100002e44 <+196>: 8b 45 94                 movl   -0x6c(%rbp), %eax
0x100002e47 <+199>: 48 83 c4 70              addq   $0x70, %rsp
0x100002e4b <+203>: 5d                       popq   %rbp
0x100002e4c <+204>: c3                       retq   
0x100002e4d <+205>: e8 7e 0f 00 00           callq  0x100003dd0               ; symbol stub for: __stack_chk_fail
0x100002e52 <+210>: 0f 0b                    ud2

Let's say you started with the address 0x100002e4c, and backed up by the max opcode size of 15 for x86_64, that would be 0x100002e3d. You would start disassembling on a non opcode boundary as this is the last byte of the 3 byte opcode at 0x100002e3b (0xc8). And this would disassembly incorrectly. So we need to make use of the functions or symbol boundaries to ensure we are disassembling correctly. If we have no function or symbol, we can do our best. But as you can see we would get things completely wrong in this case and we need to fix this as detailed.

clayborg: The issue is, you might end up backing up by N bytes, and you might not end up on an opcode…

eloparcoAuthorUnsubmitted

Done

Actually, I was mislead by the fact that so far I tried on both:

my ARM machine, with instructions of fixed size (4)
with WAMR disassembling to WASM, where, when the start address is in the middle of an instruction, only that first instruction is misinterpreted

Without going into sections and symbols as you were proposing, the solution can be as easy as done in this VS Code Extension: https://github.com/vadimcn/vscode-lldb/commit/28ae4f4bf3bd29a0c84dd586cd5360836210ab51.

I'll update the code and put some additional comments to clarify the logic. Let me know what you think.

eloparco: Actually, I was mislead by the fact that so far I tried on both: - my ARM machine, with…

eloparcoAuthorUnsubmitted

Done

Actually, I was mislead by the fact that so far I tried on both:

my ARM machine, with instructions of fixed size (4)

with WAMR disassembling to WASM, where, when the start address is in the middle of an instruction, only that first instruction is misinterpreted

Without going into sections and symbols as you were proposing, the solution can be as easy as done in this VS Code Extension: https://github.com/vadimcn/vscode-lldb/commit/28ae4f4bf3bd29a0c84dd586cd5360836210ab51.

I'll update the code and put some additional comments to clarify the logic. Let me know what you think.

eloparco: > Actually, I was mislead by the fact that so far I tried on both: > - my ARM machine, with…

eloparcoAuthorUnsubmitted

Done

@clayborg what I wrote in the comment before this one is what I implemented (Diff 6), do you think that is not enough? I'll try it on a x86 linux machine to make sure it works there

eloparco: @clayborg what I wrote in the comment before this one is what I implemented (Diff 6), do you…

clayborgUnsubmitted

Not Done

Just checked out your changes, and you are still just subtracting a value from the start address and attempting to disassemble from memory which is the problem. We need to take that subtracted address, and look it up as suggested in previous code examples I posted. If you find a function to symbol, ask those objects for their instructions. and then try to use those.

But basically for _any_ disassembly this is what I would recommend doing:

first resolve the "start_address" (no matter how you come up the address) that want to disassemble into a SBAddress
check its section. If the section is valid and contains instructions, call a function that will disassemble the address range for the section that starts at "start_address" and ends at the end of the section. We can call this "disassemble_code" as a function. More details on this below
If the section does not contain instructions, just read the bytes and emit a lines like:

0x1000 .byte 0x12
0x1000 .byte 0x34
...

Now for the disassemble_code function. We know the address range for this is in code. We then need to resolve the address passed to "disassemble_code" into a SBAddress and ask that address for a SBFunction or SBSymbol as I mentioned. Then we ask the SBFunction or SBSymbol for all instructions that they contain, and then use any instructions that fall into the range we have. If there is no SBFunction or SBSymbol, then disassemble an instruction at a time and then see if the new address will resolve to a function or symbol.

clayborg: Just checked out your changes, and you are still just subtracting a value from the start…

eloparcoAuthorUnsubmitted

Done

Tried my changes on a linux x86 machine and the loop for (unsigned i = 0; i < max_instruction_size; i++) { (L2190) takes care of the start_address possibly being in the middle of an instruction, so that's not a problem. The problem I faced is that it tries to read too far from base_addr and the ReadMemory() operation returns few instructions (without reaching base_addr). That was not happening on my macOS M1 (arm) machine.

To solve, I changed the loop at L2190 to

for (unsigned i = 0; i < bytes_offset; i++) {
    auto sb_instructions =
        _get_instructions_from_memory(start_addr + i, disassemble_bytes);

and if start_addr is in sb_instructions we're done and can exit the loop. That worked.

Another similar thing that can be done is to start from start_sbaddr as you were saying, increment the address until a valid section is found. Then call _get_instructions_from_memory() passing the section start.
What do you think? Delegating the disassembling to ReadMemory() + GetInstructions() looks simpler to me than to manually iterate over sections and get instructions from symbols and functions.
Is there any shortcoming I'm not seeing?

eloparco: Tried my changes on a linux x86 machine and the loop `for (unsigned i = 0; i <…

clayborgUnsubmitted

Not Done

so your

for (unsigned i = 0; i < max_instruction_size; i++) {

disassembles and tries to make sure you make it back to the original base address from the original disassemble packet? That can work but could a a bit time consuming?

The main issue, as you saw on x86, is you don't know what is in memory. You could have unreadable areas of memory when trying to disassemble. Also if you do have good memory that does contain instructions, there can be padding in between functions or even data between functions that the function accesses that can't be correctly disassembled and could throw things off again.

The memory regions are the safest way to traverse memory to see what you have and would help you deal with holes in the memory. You can ask about a memory region with:

lldb::SBError SBProcess::GetMemoryRegionInfo(lldb::addr_t load_addr, lldb::SBMemoryRegionInfo &region_info);

If you ask about an invalid address, like zero on most platforms, it will return a valid "region_info" with the bounds of the unreadable address range filled in no read/write/execute permissions:

(lldb) script
Python Interactive Interpreter. To exit, type 'quit()', 'exit()' or Ctrl-D.
>>> region = lldb.SBMemoryRegionInfo()
>>> err = lldb.process.GetMemoryRegionInfo(0, region)
[0x0000000000000000-0x0000000100000000 ---]

So you could use the memory region info to your advantage here. If you have execute permissions, disassemble as instructions, and if you don't emit ".byte 0xXX" for each byte. If there are no permissions, you can emit some other string like "0x00000000: <invalid memory>".

That being said, even when you do find an executable section of memory, there can be different stuff there even if a section _is_ executable. For instance, if we ask about the next memory region on a mac M1:

>>> err = lldb.process.GetMemoryRegionInfo(0x0000000100000000, region)
>>> print(region)
[0x0000000100000000-0x0000000100004000 R-X]

Notice that this is read + execute (you can access these via:

region.IsReadable()
region.IsWritable()
region.IsExecutable()

But at this address, this is the mach-o header which doesn't make sense to try and disassemble:

(lldb) memory read 0x0000000100000000
0x100000000: cf fa ed fe 0c 00 00 01 00 00 00 00 02 00 00 00  ................
0x100000010: 11 00 00 00 18 03 00 00 85 00 20 00 00 00 00 00  .......... .....
(lldb) memory read -fx -s4 -c 4 0x0000000100000000
0x100000000: 0xfeedfacf 0x0100000c 0x00000000 0x00000002

0xfeedfacf is the mach-o magic bytes for little endian 64 bit mach-o files.

(lldb) disassemble --start-address 0x0000000100000000
a.out`_mh_execute_header:
    0x100000000 <+0>:  .long  0xfeedfacf                ; unknown opcode
    0x100000004 <+4>:  .long  0x0100000c                ; unknown opcode
    0x100000008 <+8>:  udf    #0x0
    0x10000000c <+12>: udf    #0x2
    0x100000010 <+16>: udf    #0x11
    0x100000014 <+20>: udf    #0x318
    0x100000018 <+24>: .long  0x00200085                ; unknown opcode
    0x10000001c <+28>: udf    #0x0

So this is the main reason why I would suggest just disassembling using .byte or .long when we aren't in a function or symbol.

clayborg: so your ``` for (unsigned i = 0; i < max_instruction_size; i++) { ``` disassembles and tries…

eloparcoAuthorUnsubmitted

Done

understood, so in that case we also have to do the same for _handle_disassemble_positive_offset, because even there, when we use ReadInstructions(), we could have some unreadable areas in the middle and ReadInstructions() would stop prematurely when running into one of them, resulting in less instructions returned than expected

eloparco: understood, so in that case we also have to do the same for…

response_instructions.emplace_back(std::move(disass_instr));

}

return response_instructions;

}

auto _handle_disassemble_negative_offset(

lldb::addr_t base_addr, int64_t instruction_offset,

uint64_t instruction_count,

std::optional<llvm::StringRef> memory_reference) {

llvm::json::Array response_instructions;

const auto bytes_per_instruction = g_vsc.target.GetMaximumOpcodeByteSize();

const auto bytes_offset = -instruction_offset * bytes_per_instruction;

auto start_addr = base_addr - bytes_offset;

const auto disassemble_bytes = instruction_count * bytes_per_instruction;

clayborgUnsubmitted

Done

clayborg: Remove any and all printf, or fprintf statements. You can't print anything to stderr or stdout…

eloparcoAuthorUnsubmitted

Done

I suppose I have to replace llvm::errs() too, right?

eloparco: I suppose I have to replace `llvm::errs()` too, right?

clayborgUnsubmitted

Not Done

yes! the main issue is, will the user expect to see this output in the console and will it make sense to the user. I don't know what the user will think if they see "current line not found in disassembled instructions" in the debug console. That goes for all output to the console. It will have to make sense to the user. I don't know if the user will care and or be able to do anything about this message. It also isn't prefixed with a "warning:" or "error:". I would vote to remove it.

clayborg: yes! the main issue is, will the user expect to see this output in the console and will it make…

eloparcoAuthorUnsubmitted

Done

Right, it is just a print to debug, is there anything else I can use for that purpose?
Otherwise I'll just get rid of it as you were saying.

eloparco: Right, it is just a print to debug, is there anything else I can use for that purpose?

eloparcoAuthorUnsubmitted

Done

Probably better to remove it rather than prefixing the message with "debug"

eloparco: Probably better to remove it rather than prefixing the message with "debug"

clayborgUnsubmitted

Done

Use C++ comments instead of C style comments.

clayborg: Use C++ comments instead of C style comments.

// Get beginning of current stack frame to avoid reading outside of it

const auto frame_boundaries = _get_frame_boundary();

const auto low_pc = frame_boundaries.first;

const auto high_pc = frame_boundaries.second;

if (low_pc == LLDB_INVALID_ADDRESS)

return response_instructions;

if (start_addr < low_pc)

start_addr = low_pc;

auto sb_instructions =

_get_instructions_from_memory(start_addr, disassemble_bytes, high_pc);

// Find position of requested instruction

// in retrieved disassembled instructions

auto index = sb_instructions.size() + 1;

for (size_t i = 0; i < sb_instructions.size(); i++) {

if (sb_instructions[i].GetAddress().GetLoadAddress(g_vsc.target) ==

hex_string_to_addr(memory_reference)) {

index = i;

break;

}

if (index == sb_instructions.size() + 1) {

fprintf(stderr, "current line not found in disassembled instructions\n");

return response_instructions;

}

// Copy instructions into queue to easily manipulate them

std::deque<DisassembledInstruction> disass_instructions;

for (auto &instr : sb_instructions)

disass_instructions.emplace_back(DisassembledInstruction(instr));

// Make sure the address in the disassemble request is at the right position

const uint64_t expected_index = -instruction_offset;

if (index < expected_index) {

for (uint64_t i = 0; i < (expected_index - index); i++) {

DisassembledInstruction nop_instruction;

disass_instructions.emplace_front(nop_instruction);

}

} else if (index > expected_index) {

const auto num_instr_to_remove = index - expected_index;

disass_instructions.erase(disass_instructions.begin(),

disass_instructions.begin() +

num_instr_to_remove);

}

// Truncate if too many instructions

if (disass_instructions.size() > instruction_count) {

disass_instructions.erase(disass_instructions.begin() + instruction_count,

disass_instructions.end());

clayborgUnsubmitted

Done

auto base_addr = hex_string_to_addr(memory_reference);

- if (hex_string_to_addr(memory_reference) == 0) {

+ if (base_addr == 0) {

success = false;

clayborg:

}

assert(disass_instructions.size() > expected_index &&

disass_instructions[expected_index].m_address ==

memory_reference.value());

for (auto &instr : disass_instructions)

response_instructions.emplace_back(CreateDisassembledInstruction(instr));

return response_instructions;

}

void request_disassemble(const llvm::json::Object &request) {

llvm::json::Object response;

lldb::SBError error;

FillResponse(request, response);

auto arguments = request.getObject("arguments");

const auto memory_reference = arguments->getString("memoryReference");

const auto instruction_offset = GetSigned(arguments, "instructionOffset", 0);

const auto instruction_count = GetUnsigned(arguments, "instructionCount", 0);

llvm::json::Array response_instructions;

auto base_addr = hex_string_to_addr(memory_reference);

base_addr += instruction_offset;

fprintf(stdout, "disassemble -> pc=%s off=%lld count=%llu\n",

memory_reference->data(), instruction_offset, instruction_count);

bool success = true;

if (hex_string_to_addr(memory_reference) == 0) {

success = false;

fprintf(stderr, "requested memory reference is nop\n");

} else {

response_instructions =

instruction_offset >= 0

? _handle_disassemble_positive_offset(base_addr, instruction_offset,

instruction_count)

: _handle_disassemble_negative_offset(base_addr, instruction_offset,

instruction_count,

memory_reference);

}

// Add padding if not enough instructions

if (response_instructions.size() < instruction_count) {

const auto padding_len = instruction_count - response_instructions.size();

for (size_t i = 0; i < padding_len; i++) {

const DisassembledInstruction nop_instruction;

auto disass_instr = CreateDisassembledInstruction(nop_instruction);

response_instructions.emplace_back(std::move(disass_instr));

}

assert((response_instructions.size() == instruction_count) &&

"should return exact number of requested instructions");

llvm::json::Object body;

body.try_emplace("instructions", std::move(response_instructions));

response.try_emplace("body", std::move(body));

response["success"] = llvm::json::Value(success);

g_vsc.SendJSON(llvm::json::Value(std::move(response)));

}

// "SetExceptionBreakpointsRequest": { // "SetExceptionBreakpointsRequest": {

// "allOf": [ { "$ref": "#/definitions/Request" }, { // "allOf": [ { "$ref": "#/definitions/Request" }, {

// "type": "object", // "type": "object",

// "description": "SetExceptionBreakpoints request; value of command field // "description": "SetExceptionBreakpoints request; value of command field

// is 'setExceptionBreakpoints'. The request configures the debuggers // is 'setExceptionBreakpoints'. The request configures the debuggers

// response to thrown exceptions. If an exception is configured to break, a // response to thrown exceptions. If an exception is configured to break, a

// StoppedEvent is fired (event type 'exception').", "properties": { // StoppedEvent is fired (event type 'exception').", "properties": {

// "command": { // "command": {

▲ Show 20 Lines • Show All 358 Lines • ▼ Show 20 Lines void request_stackTrace(const llvm::json::Object &request) {

if (thread.IsValid()) { if (thread.IsValid()) {

const auto startFrame = GetUnsigned(arguments, "startFrame", 0); const auto startFrame = GetUnsigned(arguments, "startFrame", 0);

const auto levels = GetUnsigned(arguments, "levels", 0); const auto levels = GetUnsigned(arguments, "levels", 0);

const auto endFrame = (levels == 0) ? INT64_MAX : (startFrame + levels); const auto endFrame = (levels == 0) ? INT64_MAX : (startFrame + levels);

for (uint32_t i = startFrame; i < endFrame; ++i) { for (uint32_t i = startFrame; i < endFrame; ++i) {

auto frame = thread.GetFrameAtIndex(i); auto frame = thread.GetFrameAtIndex(i);

if (!frame.IsValid()) if (!frame.IsValid())

break; break;

if (i == 0) // Current stack frame

g_curr_frame = frame;

stackFrames.emplace_back(CreateStackFrame(frame)); stackFrames.emplace_back(CreateStackFrame(frame));

} }

const auto totalFrames = thread.GetNumFrames(); const auto totalFrames = thread.GetNumFrames();

body.try_emplace("totalFrames", totalFrames); body.try_emplace("totalFrames", totalFrames);

} }

body.try_emplace("stackFrames", std::move(stackFrames)); body.try_emplace("stackFrames", std::move(stackFrames));

response.try_emplace("body", std::move(body)); response.try_emplace("body", std::move(body));

g_vsc.SendJSON(llvm::json::Value(std::move(response))); g_vsc.SendJSON(llvm::json::Value(std::move(response)));

▲ Show 20 Lines • Show All 578 Lines • ▼ Show 20 Lines g_vsc.RegisterRequestCallback("setFunctionBreakpoints",

request_setFunctionBreakpoints); request_setFunctionBreakpoints);

g_vsc.RegisterRequestCallback("setVariable", request_setVariable); g_vsc.RegisterRequestCallback("setVariable", request_setVariable);

g_vsc.RegisterRequestCallback("source", request_source); g_vsc.RegisterRequestCallback("source", request_source);

g_vsc.RegisterRequestCallback("stackTrace", request_stackTrace); g_vsc.RegisterRequestCallback("stackTrace", request_stackTrace);

g_vsc.RegisterRequestCallback("stepIn", request_stepIn); g_vsc.RegisterRequestCallback("stepIn", request_stepIn);

g_vsc.RegisterRequestCallback("stepOut", request_stepOut); g_vsc.RegisterRequestCallback("stepOut", request_stepOut);

g_vsc.RegisterRequestCallback("threads", request_threads); g_vsc.RegisterRequestCallback("threads", request_threads);

g_vsc.RegisterRequestCallback("variables", request_variables); g_vsc.RegisterRequestCallback("variables", request_variables);

g_vsc.RegisterRequestCallback("disassemble", request_disassemble);

// Custom requests // Custom requests

g_vsc.RegisterRequestCallback("compileUnits", request_compileUnits); g_vsc.RegisterRequestCallback("compileUnits", request_compileUnits);

g_vsc.RegisterRequestCallback("modules", request_modules); g_vsc.RegisterRequestCallback("modules", request_modules);

// Testing requests // Testing requests

g_vsc.RegisterRequestCallback("_testGetTargetBreakpoints", g_vsc.RegisterRequestCallback("_testGetTargetBreakpoints",

request__testGetTargetBreakpoints); request__testGetTargetBreakpoints);

} }

▲ Show 20 Lines • Show All 214 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[lldb-vscode] Add support for disassembly viewNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 484140

lldb/include/lldb/API/SBTarget.h

lldb/source/API/SBTarget.cpp

lldb/tools/lldb-vscode/CMakeLists.txt

lldb/tools/lldb-vscode/DisassembledInstruction.h

lldb/tools/lldb-vscode/DisassembledInstruction.cpp

lldb/tools/lldb-vscode/JSONUtils.h

lldb/tools/lldb-vscode/JSONUtils.cpp

lldb/tools/lldb-vscode/LLDBUtils.h

lldb/tools/lldb-vscode/LLDBUtils.cpp

lldb/tools/lldb-vscode/VSCode.h

lldb/tools/lldb-vscode/VSCodeForward.h

lldb/tools/lldb-vscode/lldb-vscode.cpp

[lldb-vscode] Add support for disassembly view
Needs ReviewPublic