This is an archive of the discontinued LLVM Phabricator instance.

[RFC] When calling the process mmap try to call all found instead of just the first one
Needs ReviewPublic

Authored by aadsm on Sep 17 2020, 4:10 PM.

Details

Summary

I found an issue where the expression evaluation doesn't work (for non trivial exprs) when the leak sanitizer is enabled in the process being debugged.
lldb defaults to interpertation because it thinks it can't JIT it (and most interesting exprs can't be interpreted). The JIT check fails because calling the process's mmap fails.

I debugged the InferiorCallMmap function to figure this out. This function finds all functions with mmap name in it and executes the first one in that list.
When asan is enabled (and by default on linux it turns on the leak sanitizer) it introduces a new mmap symbol in the binary (the only that will be called instead of the real one). This can be found here: https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/sanitizer_common/symbolizer/sanitizer_wrappers.cpp#L155-L160. And there is also another new function named __interceptor_mmap which registers the call for the sanitizer I think. You can find that one here (I think): https://github.com/llvm/llvm-project/blob/6148cca70888ead020a808279043fd013ca72a2a/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc#L7304-L7313.

When lldb calls the redefined mmap it will return -1 but will be able to allocate memory if it calls the ___interceptor_mmap though. Both functions end up calling __internal_mmap but the big difference between the two is that the redefined mmap also checks the return of internal_mmap with internal_iserror and will return -1 when it fails the check. Here's the code for that: https://github.com/llvm/llvm-project/blob/6148cca70888ead020a808279043fd013ca72a2a/compiler-rt/lib/sanitizer_common/sanitizer_syscall_linux_x86_64.inc#L83-L90. Not really sure why to be honest. I tried to debug but it was not simple inside the ThreadPlanCallFunction, I tried to enable SINGLE_STEP_EXPRESSIONS but then the process kind of dies on me.

I came up with a simple fix to get around this which is to try each mmap we find (and not only the first one) until we find one that works. Another possible solution is to see if internal_mmap exists and call that one if it does.. I feel it's a bit more implementation dependent though, while the loop sounds more future proof (I'm assuming calling a bunch of mmaps until one actually works won't launch missiles).

Thoughts?

(This implemention is not final, it should be possible to move a bunch of things out of the loop I believe)
Not really sure who to add for review so adding a few people I've seen in the code.

Diff Detail

Event Timeline

aadsm created this revision.Sep 17 2020, 4:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 17 2020, 4:10 PM
aadsm requested review of this revision.Sep 17 2020, 4:10 PM
aadsm edited the summary of this revision. (Show Details)Sep 17 2020, 4:14 PM
shafik added a reviewer: vsk.

Calling every symbol called mmap you find around till one returns some value makes me a bit nervous. For instance, on macOS dyld has a symbol called "mmap" that isn't meant to be called by anybody but dyld. We probably don't want to accidentally call that.

You can get around that particular instance by only looking for exported functions (macOS's internal mmap is not exported). I can't think of any good reason we would want to call an unexported mmap, since we're trying to do what a program on the system would do.

vsk added a comment.Sep 21 2020, 2:57 PM

It seems like calling any 'mmap' definition should work. Is the interposed mmap implementation failing and correctly returning -1, or is it succeeding and incorrectly returning -1? In either case, it seems like it's worth digging into why it's failing / returning the wrong result. (Just my two cents - I'm not really familiar with this code, so take it with a grain of salt :)

Should lldb avoid calling asan (or any other sanitizer's) mmap? If so, maybe this function can be sanitizer aware and ignore those symbols.

Should lldb avoid calling asan (or any other sanitizer's) mmap? If so, maybe this function can be sanitizer aware and ignore those symbols.

At present, this is a function that gets used for any Posix, and I don't think it can know about sanitizers. If you want to do something of this sort, then the Platform should be asked for the library containing the canonical mmap, and we can load it from there. That would have the nice effect that we wouldn't search the whole symbol world for mmap, and allow the Platform to pick the right one.

So sounds like first we should verify if the "mmap" for asan is exported or not. If it isn't then this is an easy fix, iterate through all possible matches and avoid any internal versions of mmap and only call externally visible symbols. Antonio, can you verify the visibility of the symbol?

bool Symbol::IsExternal() const { return m_is_external; }
aadsm added a comment.Sep 21 2020, 9:08 PM

It seems like calling any 'mmap' definition should work. Is the interposed mmap implementation failing and correctly returning -1, or is it succeeding and incorrectly returning -1? In either case, it seems like it's worth digging into why it's failing / returning the wrong result

@vsk, yeah, this is what I've tried to do before I submitted this diff. It should work because it works when lldb is not attached. I wanted to step into it (by defining SINGLE_STEP_EXPRESSIONS) and see what's going on but the process crashes on me when I do that. It has to work just because if I call __interceptor_mmap it works and the only difference between those two is that one calls internal_iserror. InferiorCallMmap uses ThreadPlanCallFunction which seems to ignore breakpoints because it uses the subplan "run to address" that makes sure it only stops at the address given so it has been hard to put a breakpoint there. The only exception to this is when I define the SINGLE_STEP_EXPRESSIONS but like I said the process crashes or something, don't remember anymore. I can try again.

@clayborg I'll try that.

Thanks everyone for their input, I'll follow up on this next week because right now I'm on vacation.

labath added a subscriber: labath.Sep 30 2020, 7:13 AM

A completely different approach would be to avoid the mmap function completely, and go for the mmap syscall instead.

That is, instead of setting up registers to fake call to mmap and doing a run-to entry point breakpoint, we could set them up to fake a syscall, and then do an instruction-step over a syscall instruction (which we could place at the entry point, or find a suitable one in the binary).

The advantage of that would be that this would work not only in this (sanitizer) case, but also in all other cases where an mmap symbol is not present/functional/unambiguous:

  • a bare-bone statically linked binary need not contain an mmap function
  • very early in the program startup (before relocations are applied) it may not be safe to call the global mmap
  • mmap may be buggy (libc debugging?)

Note that this would not need to be implemented in the lldb client. This sort of thing would be natural to implement in lldb server in response to the _M packet. There it would be easy to encode the abi details needed to issue a syscall. The client already prefers this packet, and the existing code could remain as a fallback for platforms not implementing it.

A completely different approach would be to avoid the mmap function completely, and go for the mmap syscall instead.

That is, instead of setting up registers to fake call to mmap and doing a run-to entry point breakpoint, we could set them up to fake a syscall, and then do an instruction-step over a syscall instruction (which we could place at the entry point, or find a suitable one in the binary).

The advantage of that would be that this would work not only in this (sanitizer) case, but also in all other cases where an mmap symbol is not present/functional/unambiguous:

  • a bare-bone statically linked binary need not contain an mmap function
  • very early in the program startup (before relocations are applied) it may not be safe to call the global mmap
  • mmap may be buggy (libc debugging?)

That is possible, though how do we figure out the syscall enumeration for the mmap syscall reliably for a given system? And if we are remote debugging? Seems like we are signing up for architecture specific syscalls on every different architecture. Not sure I would go that route, but I am open to seeing a solution before passing judgement.

Note that this would not need to be implemented in the lldb client. This sort of thing would be natural to implement in lldb server in response to the _M packet. There it would be easy to encode the abi details needed to issue a syscall. The client already prefers this packet, and the existing code could remain as a fallback for platforms not implementing it.

There is no code currently that does any run control down inside of lldb-server and I would like to keep it that way if possible. debugserver can do it only because we have the task port to the program we are debugging and we can call a function in the debugserver process that does the memory allocation in the debug process. Having this done in lldb-server would require lldb-server to perform run control by changing register values, running the syscall, stopping at a breakpoint to after the syscall has run, removing that breakpoint only if it didn't exist already. Seems like a lot of dangerous flow control that we won't be able to see if anything goes wrong. Right now if we are doing it all through the GDB remote protocol, we can see exactly how things go wrong in the packet log, but now it would be a mystery if things go wrong.

labath added a comment.Oct 1 2020, 1:00 AM

That is possible, though how do we figure out the syscall enumeration for the mmap syscall reliably for a given system? And if we are remote debugging? Seems like we are signing up for architecture specific syscalls on every different architecture. Not sure I would go that route, but I am open to seeing a solution before passing judgement.

Well.. we already need to "know" the right target-specific values for the various PROT_ and MAP_ flags, so I don't think including the syscall number (and the syscall opcode) would not be too much of a stretch.

Note that this would not need to be implemented in the lldb client. This sort of thing would be natural to implement in lldb server in response to the _M packet. There it would be easy to encode the abi details needed to issue a syscall. The client already prefers this packet, and the existing code could remain as a fallback for platforms not implementing it.

There is no code currently that does any run control down inside of lldb-server and I would like to keep it that way if possible. debugserver can do it only because we have the task port to the program we are debugging and we can call a function in the debugserver process that does the memory allocation in the debug process. Having this done in lldb-server would require lldb-server to perform run control by changing register values, running the syscall, stopping at a breakpoint to after the syscall has run, removing that breakpoint only if it didn't exist already. Seems like a lot of dangerous flow control that we won't be able to see if anything goes wrong. Right now if we are doing it all through the GDB remote protocol, we can see exactly how things go wrong in the packet log, but now it would be a mystery if things go wrong.

I do share that sentiment, and if the setup needed for this would be anything like the mmap dance, I wouldn't even try it. The only reason I brought this up is because I expect the syscall routine to be much simpler (like, maybe simpler than the instruction emulation routine that we do on arm). There's no need to set breakpoints, as we're just PTRACE_SINGLESTEPing over a single instruction (unless we're on arm of course). So the implementation would be something like:

  • save all registers
  • find first executable page
  • write "int 0x80" to the first two bytes
  • setup registers for a syscall (including pointing pc to the int 0x80 instruction)
  • PTRACE_SINGLESTEP+waitpid
  • fetch result from %rax
  • restore bytes overwritten by the int 0x80
  • restore all registers

I don't think this would be too complicated, though it would obviously have to be done with a steady hand. In fact, one of the original use cases for the linux ptrace syscall was the ability for the tracer to track/replace/emulate syscalls done by the traced process. Unfortunately, the intended use was a bit different (things like qemu and user-mode linux) and it assumes the tracer wants to modify a syscall that the tracee already wanted to make (and not inject a new one "out of the blue"). This means we cannot use the existing support for that. That said, I think I've managed to convince the NetBSD folks to include out-of-blue syscall support in their ptrace implementation. Maybe I should try the same for linux... :P

That is possible, though how do we figure out the syscall enumeration for the mmap syscall reliably for a given system? And if we are remote debugging? Seems like we are signing up for architecture specific syscalls on every different architecture. Not sure I would go that route, but I am open to seeing a solution before passing judgement.

Well.. we already need to "know" the right target-specific values for the various PROT_ and MAP_ flags, so I don't think including the syscall number (and the syscall opcode) would not be too much of a stretch.

Note that this would not need to be implemented in the lldb client. This sort of thing would be natural to implement in lldb server in response to the _M packet. There it would be easy to encode the abi details needed to issue a syscall. The client already prefers this packet, and the existing code could remain as a fallback for platforms not implementing it.

There is no code currently that does any run control down inside of lldb-server and I would like to keep it that way if possible. debugserver can do it only because we have the task port to the program we are debugging and we can call a function in the debugserver process that does the memory allocation in the debug process. Having this done in lldb-server would require lldb-server to perform run control by changing register values, running the syscall, stopping at a breakpoint to after the syscall has run, removing that breakpoint only if it didn't exist already. Seems like a lot of dangerous flow control that we won't be able to see if anything goes wrong. Right now if we are doing it all through the GDB remote protocol, we can see exactly how things go wrong in the packet log, but now it would be a mystery if things go wrong.

I do share that sentiment, and if the setup needed for this would be anything like the mmap dance, I wouldn't even try it. The only reason I brought this up is because I expect the syscall routine to be much simpler (like, maybe simpler than the instruction emulation routine that we do on arm). There's no need to set breakpoints, as we're just PTRACE_SINGLESTEPing over a single instruction (unless we're on arm of course). So the implementation would be something like:

  • save all registers
  • find first executable page
  • write "int 0x80" to the first two bytes
  • setup registers for a syscall (including pointing pc to the int 0x80 instruction)
  • PTRACE_SINGLESTEP+waitpid
  • fetch result from %rax
  • restore bytes overwritten by the int 0x80
  • restore all registers

I don't think this would be too complicated, though it would obviously have to be done with a steady hand. In fact, one of the original use cases for the linux ptrace syscall was the ability for the tracer to track/replace/emulate syscalls done by the traced process. Unfortunately, the intended use was a bit different (things like qemu and user-mode linux) and it assumes the tracer wants to modify a syscall that the tracee already wanted to make (and not inject a new one "out of the blue"). This means we cannot use the existing support for that. That said, I think I've managed to convince the NetBSD folks to include out-of-blue syscall support in their ptrace implementation. Maybe I should try the same for linux... :P

If we can truly get away with the single stepping over a single instructions, then I could see this working. Many ARM chips don't support hardware single stepping, so software single stepping would need to be used and that would complicate things. This is arch specific, so each arch would need to know how to do this. Happy to review a patch and see if we can make it work. Might be nice to have a setting for this in case it causes problems to stop it from being attempted:

(lldb) setting set plugin.process.gdb-remote.use-allocate-memory-packets false
labath added a comment.Oct 2 2020, 6:42 AM

That sounds like a plan. FWIW, here's the implementation I hacked up today:

Status NativeProcessLinux::AllocateMemory(size_t size, uint32_t permissions,
                                          lldb::addr_t &addr) {
  PopulateMemoryRegionCache();
  auto region_it = llvm::find_if(m_mem_region_cache, [](const auto &pair) {
    return pair.first.GetExecutable() == MemoryRegionInfo::eYes;
  });
  if (region_it == m_mem_region_cache.end())
    return Status("No executable memory region found!");
  addr_t exe_addr = region_it->first.GetRange().GetRangeBase();

  NativeThreadLinux &thread = *GetThreadByID(GetID());
  assert(thread.GetState() == eStateStopped);

  int prot = PROT_NONE;
  if (permissions & ePermissionsReadable)
    prot |= PROT_READ;
  if (permissions & ePermissionsWritable)
    prot |= PROT_WRITE;
  if (permissions & ePermissionsExecutable)
    prot |= PROT_EXEC;

  NativeRegisterContextLinux &reg_ctx = thread.GetRegisterContext();
  DataBufferSP registers_sp;
  reg_ctx.ReadAllRegisterValues(registers_sp);
  uint8_t memory[2];
  size_t bytes_read;
  ReadMemory(exe_addr, memory, 2, bytes_read);

  reg_ctx.SetPC(exe_addr);
  reg_ctx.WriteRegisterFromUnsigned(lldb_rax_x86_64, SYS_mmap);
  reg_ctx.WriteRegisterFromUnsigned(lldb_rdi_x86_64, 0);
  reg_ctx.WriteRegisterFromUnsigned(lldb_rsi_x86_64, size);
  reg_ctx.WriteRegisterFromUnsigned(lldb_rdx_x86_64, prot);
  reg_ctx.WriteRegisterFromUnsigned(lldb_r10_x86_64,
                                    MAP_ANONYMOUS | MAP_PRIVATE);
  reg_ctx.WriteRegisterFromUnsigned(lldb_r8_x86_64, -1);
  reg_ctx.WriteRegisterFromUnsigned(lldb_r9_x86_64, 0);
  WriteMemory(exe_addr, "\x0f\x05", 2, bytes_read);
  PtraceWrapper(PTRACE_SINGLESTEP, thread.GetID(), nullptr, nullptr);
  int status;
  ::pid_t wait_pid = llvm::sys::RetryAfterSignal(-1, ::waitpid, thread.GetID(),
                                                 &status, __WALL);
  assert((unsigned)wait_pid == thread.GetID());
  addr = reg_ctx.ReadRegisterAsUnsigned(lldb_rax_x86_64, -ESRCH);

  WriteMemory(exe_addr, memory, 2, bytes_read);
  reg_ctx.WriteAllRegisterValues(registers_sp);

  if (addr > -4096)
    return Status(-addr, eErrorTypePOSIX);
  return Status();
}

It needs more error handling, and generalization to non-x86 architectures, but it actually works, and is enough to make all but one test pass (TestBitfields.py seems to fail due to some data corruption -- quite puzzling).

That sounds like a plan. FWIW, here's the implementation I hacked up today:

Status NativeProcessLinux::AllocateMemory(size_t size, uint32_t permissions,
                                          lldb::addr_t &addr) {
  PopulateMemoryRegionCache();
  auto region_it = llvm::find_if(m_mem_region_cache, [](const auto &pair) {
    return pair.first.GetExecutable() == MemoryRegionInfo::eYes;
  });
  if (region_it == m_mem_region_cache.end())
    return Status("No executable memory region found!");
  addr_t exe_addr = region_it->first.GetRange().GetRangeBase();

  NativeThreadLinux &thread = *GetThreadByID(GetID());
  assert(thread.GetState() == eStateStopped);

  int prot = PROT_NONE;
  if (permissions & ePermissionsReadable)
    prot |= PROT_READ;
  if (permissions & ePermissionsWritable)
    prot |= PROT_WRITE;
  if (permissions & ePermissionsExecutable)
    prot |= PROT_EXEC;

  NativeRegisterContextLinux &reg_ctx = thread.GetRegisterContext();
  DataBufferSP registers_sp;
  reg_ctx.ReadAllRegisterValues(registers_sp);
  uint8_t memory[2];
  size_t bytes_read;
  ReadMemory(exe_addr, memory, 2, bytes_read);

  reg_ctx.SetPC(exe_addr);
  reg_ctx.WriteRegisterFromUnsigned(lldb_rax_x86_64, SYS_mmap);
  reg_ctx.WriteRegisterFromUnsigned(lldb_rdi_x86_64, 0);
  reg_ctx.WriteRegisterFromUnsigned(lldb_rsi_x86_64, size);
  reg_ctx.WriteRegisterFromUnsigned(lldb_rdx_x86_64, prot);
  reg_ctx.WriteRegisterFromUnsigned(lldb_r10_x86_64,
                                    MAP_ANONYMOUS | MAP_PRIVATE);
  reg_ctx.WriteRegisterFromUnsigned(lldb_r8_x86_64, -1);
  reg_ctx.WriteRegisterFromUnsigned(lldb_r9_x86_64, 0);
  WriteMemory(exe_addr, "\x0f\x05", 2, bytes_read);
  PtraceWrapper(PTRACE_SINGLESTEP, thread.GetID(), nullptr, nullptr);
  int status;
  ::pid_t wait_pid = llvm::sys::RetryAfterSignal(-1, ::waitpid, thread.GetID(),
                                                 &status, __WALL);
  assert((unsigned)wait_pid == thread.GetID());
  addr = reg_ctx.ReadRegisterAsUnsigned(lldb_rax_x86_64, -ESRCH);

  WriteMemory(exe_addr, memory, 2, bytes_read);
  reg_ctx.WriteAllRegisterValues(registers_sp);

  if (addr > -4096)
    return Status(-addr, eErrorTypePOSIX);
  return Status();
}

It needs more error handling, and generalization to non-x86 architectures, but it actually works, and is enough to make all but one test pass (TestBitfields.py seems to fail due to some data corruption -- quite puzzling).

Very cool. That seems simple enough to allow us to try this out, at least on unix variants. Does PTRACE_SINGLESTEP cause _only_ the current thread to single step and keep all other threads paused? We can't allow any other threads making any progress when the process is briefly resumed in a multi-threaded process.

This nice thing is lldb-server _is_ compiled natively for each system so we can rely on system headers for the sys call numbers if they are available. So looks like this could be made to work. Did you also do the deallocate memory? I ran a few expressions and I am not sure that we ever use the deallocate memory packets to debugserver on mac, so it might not be needed.

That sounds like a plan. FWIW, here's the implementation I hacked up today:

Status NativeProcessLinux::AllocateMemory(size_t size, uint32_t permissions,
                                          lldb::addr_t &addr) {
  PopulateMemoryRegionCache();
  auto region_it = llvm::find_if(m_mem_region_cache, [](const auto &pair) {
    return pair.first.GetExecutable() == MemoryRegionInfo::eYes;
  });
  if (region_it == m_mem_region_cache.end())
    return Status("No executable memory region found!");
  addr_t exe_addr = region_it->first.GetRange().GetRangeBase();

  NativeThreadLinux &thread = *GetThreadByID(GetID());
  assert(thread.GetState() == eStateStopped);

  int prot = PROT_NONE;
  if (permissions & ePermissionsReadable)
    prot |= PROT_READ;
  if (permissions & ePermissionsWritable)
    prot |= PROT_WRITE;
  if (permissions & ePermissionsExecutable)
    prot |= PROT_EXEC;

  NativeRegisterContextLinux &reg_ctx = thread.GetRegisterContext();
  DataBufferSP registers_sp;
  reg_ctx.ReadAllRegisterValues(registers_sp);
  uint8_t memory[2];
  size_t bytes_read;
  ReadMemory(exe_addr, memory, 2, bytes_read);

  reg_ctx.SetPC(exe_addr);
  reg_ctx.WriteRegisterFromUnsigned(lldb_rax_x86_64, SYS_mmap);
  reg_ctx.WriteRegisterFromUnsigned(lldb_rdi_x86_64, 0);
  reg_ctx.WriteRegisterFromUnsigned(lldb_rsi_x86_64, size);
  reg_ctx.WriteRegisterFromUnsigned(lldb_rdx_x86_64, prot);
  reg_ctx.WriteRegisterFromUnsigned(lldb_r10_x86_64,
                                    MAP_ANONYMOUS | MAP_PRIVATE);
  reg_ctx.WriteRegisterFromUnsigned(lldb_r8_x86_64, -1);
  reg_ctx.WriteRegisterFromUnsigned(lldb_r9_x86_64, 0);
  WriteMemory(exe_addr, "\x0f\x05", 2, bytes_read);
  PtraceWrapper(PTRACE_SINGLESTEP, thread.GetID(), nullptr, nullptr);
  int status;
  ::pid_t wait_pid = llvm::sys::RetryAfterSignal(-1, ::waitpid, thread.GetID(),
                                                 &status, __WALL);
  assert((unsigned)wait_pid == thread.GetID());
  addr = reg_ctx.ReadRegisterAsUnsigned(lldb_rax_x86_64, -ESRCH);

  WriteMemory(exe_addr, memory, 2, bytes_read);
  reg_ctx.WriteAllRegisterValues(registers_sp);

  if (addr > -4096)
    return Status(-addr, eErrorTypePOSIX);
  return Status();
}

It needs more error handling, and generalization to non-x86 architectures, but it actually works, and is enough to make all but one test pass (TestBitfields.py seems to fail due to some data corruption -- quite puzzling).

Very cool. That seems simple enough to allow us to try this out, at least on unix variants. Does PTRACE_SINGLESTEP cause _only_ the current thread to single step and keep all other threads paused? We can't allow any other threads making any progress when the process is briefly resumed in a multi-threaded process.

This nice thing is lldb-server _is_ compiled natively for each system so we can rely on system headers for the sys call numbers if they are available. So looks like this could be made to work. Did you also do the deallocate memory? I ran a few expressions and I am not sure that we ever use the deallocate memory packets to debugserver on mac, so it might not be needed.

We deallocate memory when we remove the space we've made for expression result variables and the like after running expressions.

aadsm updated this revision to Diff 296220.Oct 5 2020, 10:24 AM

I explored Greg's suggestion of checking if the functions were external symbols. As suspected the overriden mmap is not external, so we can use this check to avoid calling it!

It does look to me that Pavel's suggestion is the way to go in the future but I don't have the knowledge to implement the suggestion nor (unfortunately) the bandwidth to gain that knowledge right. In the meanwhile I was hoping we could go with this new version of the patch.

Glad the other mmap was not marked external, that makes this much easier.

lldb/source/Plugins/Process/Utility/InferiorCallPOSIX.cpp
54

Why are we checking "IsWeak()" here?

aadsm added inline comments.Oct 5 2020, 3:52 PM
lldb/source/Plugins/Process/Utility/InferiorCallPOSIX.cpp
54

A weak symbol is also an external symbol, but it's weak in the sense that another external symbol with the same name will take precedence over it (as far as I understood).

clayborg requested changes to this revision.Oct 5 2020, 11:22 PM

We probably don't need to check for IsWeak() as we wouldn't want an internal + weak symbol (if there is such a thing) to match and be used.

lldb/source/Plugins/Process/Utility/InferiorCallPOSIX.cpp
36

Might be nice to return a llvm::Error here instead of pool? Maybe to help explain what went wrong? like any of:

"thread required to call mmap"
"couldn't find any symbols named 'mmap'"
"no external symbols named 'mmap' were found"
"mmap call failed" (for when it returns UINT32_MAX for 4 byte addresses or UINT64_MAX for 8 byte addresses"

It have been nice to see a nice error message during the expression evaluation prior to this fix. At the very least we should log to the expression log channel that mmap failed if we choose not to return an error.

54

I think we only need to check for external here. Any weak symbol will also need to be external, but if it isn't we don't want that symbol.

This revision now requires changes to proceed.Oct 5 2020, 11:22 PM
labath added a comment.Oct 6 2020, 5:35 AM

Regardless of the future of the syscall approach, I think limiting ourselves to external mmap symbols is a good idea. And if it fixes your problem -- great.

It would be good to have a test for that, though. Maybe something like this would do?

static void *mmap() { return (void *)47; }

int call_me() { return 42; }

int main() {
  mmap();
  return call_me();
}

right now, I am unable to evaluate call_me() with lldb. I guess your patch should make that possible?

lldb/source/Plugins/Process/Utility/InferiorCallPOSIX.cpp
54

Your understanding is correct, at least at the object file level -- I'm not sure whether IsWeak() implies IsExternal() inside lldb. However, I would actually argue for removal of IsWeak() for a different reason -- any weak definition of mmap is not going to be used by the process since libc already has a strong definition of that symbol.

If we really end up in a situation where we only have a weak mmap symbol around, then this is probably a sufficiently strange setup that we don't want to be randomly calling that function.

labath added a comment.Oct 6 2020, 5:44 AM

That sounds like a plan. FWIW, here's the implementation I hacked up today:

Status NativeProcessLinux::AllocateMemory(size_t size, uint32_t permissions,
                                          lldb::addr_t &addr) {
  PopulateMemoryRegionCache();
  auto region_it = llvm::find_if(m_mem_region_cache, [](const auto &pair) {
    return pair.first.GetExecutable() == MemoryRegionInfo::eYes;
  });
  if (region_it == m_mem_region_cache.end())
    return Status("No executable memory region found!");
  addr_t exe_addr = region_it->first.GetRange().GetRangeBase();

  NativeThreadLinux &thread = *GetThreadByID(GetID());
  assert(thread.GetState() == eStateStopped);

  int prot = PROT_NONE;
  if (permissions & ePermissionsReadable)
    prot |= PROT_READ;
  if (permissions & ePermissionsWritable)
    prot |= PROT_WRITE;
  if (permissions & ePermissionsExecutable)
    prot |= PROT_EXEC;

  NativeRegisterContextLinux &reg_ctx = thread.GetRegisterContext();
  DataBufferSP registers_sp;
  reg_ctx.ReadAllRegisterValues(registers_sp);
  uint8_t memory[2];
  size_t bytes_read;
  ReadMemory(exe_addr, memory, 2, bytes_read);

  reg_ctx.SetPC(exe_addr);
  reg_ctx.WriteRegisterFromUnsigned(lldb_rax_x86_64, SYS_mmap);
  reg_ctx.WriteRegisterFromUnsigned(lldb_rdi_x86_64, 0);
  reg_ctx.WriteRegisterFromUnsigned(lldb_rsi_x86_64, size);
  reg_ctx.WriteRegisterFromUnsigned(lldb_rdx_x86_64, prot);
  reg_ctx.WriteRegisterFromUnsigned(lldb_r10_x86_64,
                                    MAP_ANONYMOUS | MAP_PRIVATE);
  reg_ctx.WriteRegisterFromUnsigned(lldb_r8_x86_64, -1);
  reg_ctx.WriteRegisterFromUnsigned(lldb_r9_x86_64, 0);
  WriteMemory(exe_addr, "\x0f\x05", 2, bytes_read);
  PtraceWrapper(PTRACE_SINGLESTEP, thread.GetID(), nullptr, nullptr);
  int status;
  ::pid_t wait_pid = llvm::sys::RetryAfterSignal(-1, ::waitpid, thread.GetID(),
                                                 &status, __WALL);
  assert((unsigned)wait_pid == thread.GetID());
  addr = reg_ctx.ReadRegisterAsUnsigned(lldb_rax_x86_64, -ESRCH);

  WriteMemory(exe_addr, memory, 2, bytes_read);
  reg_ctx.WriteAllRegisterValues(registers_sp);

  if (addr > -4096)
    return Status(-addr, eErrorTypePOSIX);
  return Status();
}

It needs more error handling, and generalization to non-x86 architectures, but it actually works, and is enough to make all but one test pass (TestBitfields.py seems to fail due to some data corruption -- quite puzzling).

Very cool. That seems simple enough to allow us to try this out, at least on unix variants.

Ok, I'll to find some time to implement that properly.

Does PTRACE_SINGLESTEP cause _only_ the current thread to single step and keep all other threads paused?

Affirmative.

This nice thing is lldb-server _is_ compiled natively for each system so we can rely on system headers for the sys call numbers if they are available.

Unfortunately things are not as rosy as that. On linux, the syscall numbers can differ between 32- and 64-bit variants (on x86_64, SYS_mmap is 9, i386 it's 90). Since 64-bit lldb-server can debug a 32-bit inferior, it still needs to hardcode the syscall number for it. I believe the other mmap constants (PROT_***, MAP_***) are the same on all variants, though they could theoretically be different...

So looks like this could be made to work. Did you also do the deallocate memory? I ran a few expressions and I am not sure that we ever use the deallocate memory packets to debugserver on mac, so it might not be needed.

For my prototype. I did not, but the code for that would be fairly similar (and could be factored into a common function, etc..).

aadsm added inline comments.Oct 6 2020, 8:47 AM
lldb/source/Plugins/Process/Utility/InferiorCallPOSIX.cpp
54

All (with the exception of the one overriden by the leak sanitizer) mmap symbols in the symbol table are Weak bound but none are External bound, this was the main reason that lead me to investigate the difference between the two.

Not sure how to check how lldb interprets the Weak overall, but I think it kind of does, because that's what I saw when I dumped the symbol table:

(lldb) target modules dump symtab libc.so.6
               Debug symbol
               |Synthetic symbol
               ||Externally Visible
               |||
Index   UserID DSX Type            File Address/Value Load Address       Size               Flags      Name
------- ------ --- --------------- ------------------ ------------------ ------------------ ---------- ----------------------------------
...
[ 6945]   6947   X Code            0x000000000010b5e0 0x00007ffff69db5e0 0x00000000000000f9 0x00000012 __mmap
...

Another solution I thought was to add a IsLocal to the Symbol (and use !IsLocal) but then not really sure how to implement this for all Symbols not ELFSymbol.

aadsm added inline comments.Oct 6 2020, 8:49 AM
lldb/source/Plugins/Process/Utility/InferiorCallPOSIX.cpp
36

I'd prefer to make this part of another diff and focus this one on the fix.

labath added inline comments.Oct 6 2020, 9:09 AM
lldb/source/Plugins/Process/Utility/InferiorCallPOSIX.cpp
54

Interesting. I guess I should have verified my assumptions. With that in mind, I think checking for both weak and external (strong) symbols is fine.

clayborg added inline comments.Oct 6 2020, 5:02 PM
lldb/source/Plugins/Process/Utility/InferiorCallPOSIX.cpp
54

I think this is a bug in the ObjectFileELF where it is setting weak but not external for weak symbols. Weak symbols would be useless if they weren't external as the first one to get used should end up winning and all other weak symbols should resolve to the first symbol to come along that matches when dyld does a lookup. So I think we should verify this with some linker folks and remove IsWeak from this if statement, and fix the ObjectFileELF symbol table parsing code.

labath added inline comments.Oct 7 2020, 4:55 AM
lldb/source/Plugins/Process/Utility/InferiorCallPOSIX.cpp
54

That sounds reasonable. The documentation for SBSymbol::IsExternal says "A read only property that returns a boolean value that indicates if this symbol is externally visible (exported) from the module that contains it." An elf weak symbol definitely fits that definition.

Note that this would be a different meaning of "external" than what is used for the "external" llvm linkage type. There, it refers to a specific linkage type (externally visible and strong), distinct from weak (and linkonce, etc) linkage types, though most of them are still externally visible.

clayborg added inline comments.Oct 15 2020, 11:46 AM
lldb/source/Plugins/Process/Utility/InferiorCallPOSIX.cpp
54

I checked ObjectFileMachO and it indeed sets external and weak separately. For the debugger, we just want to know if the symbol is externally visible from an object file standpoint. So we can fix ObjectFileELF.cpp with code:

const bool is_weak = symbol.getBinding() == STB_WEAK;
Symbol dc_symbol(
    i + start_id, // ID is the original symbol table index.
    mangled,
    symbol_type,                    // Type of this symbol
    is_global | is_weak,            // Is this globally visible?
    false,                          // Is this symbol debug info?
    false,                          // Is this symbol a trampoline?
    false,                          // Is this symbol artificial?
    AddressRange(symbol_section_sp, // Section in which this symbol is
                                    // defined or null.
                 symbol_value,      // Offset in section or symbol value.
                 symbol.st_size),   // Size in bytes of this symbol.
    symbol_size_valid,              // Symbol size is valid
    has_suffix,                     // Contains linker annotations?
    flags);                         // Symbol flags.
if (is_weak)
  dc_symbol.SetIsWeak(true);
aadsm added a comment.Oct 18 2020, 4:52 PM

Should I still go ahead with this since @labath implemented the memory allocation on lldb-server?

Should I still go ahead with this since @labath implemented the memory allocation on lldb-server?

I think going for global symbols still makes, and could still benefit some platforms, as my implementation is only for x86 linux.

Which reminds me I wanted to implement that stuff for non-x86 arches too. Giving it a try on arm would be appreciated.

Should I still go ahead with this since @labath implemented the memory allocation on lldb-server?

yes! It might be best to move this lookup functionality into lldb_private::Target so other bits of code that try to call a function directory can benefit. But we don't need to do that for this patch if you don't have time as there are not too many locations that do this.

aadsm updated this revision to Diff 302178.Nov 1 2020, 3:46 PM

Also set the symbol as external when it is weak

labath added a comment.Nov 2 2020, 1:10 AM

It would be good to have a test for the ObjectFileELF part at least. I'm thinking of something similar to what we have in test/Shell/ObjectFile/PECOFF/symbol.yaml for COFF...

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2218

I think it'd be more natural to do the STB_WEAK check here. Maybe something like:

switch (symbol.getBinding()) {
  case STB_GLOBAL:
    is_external = true;
    break;
   case STB_WEAK:
    is_weak = true;
    is_external = true;
}