This is an archive of the discontinued LLVM Phabricator instance.

[lldb-server][linux] Add ability to allocate memory
ClosedPublic

Authored by labath on Oct 9 2020, 5:39 AM.

Details

Summary

This patch adds support for the _M and _m gdb-remote packets, which
(de)allocate memory in the inferior. This works by "injecting" a
m(un)map syscall into the inferior. This consists of:

  • finding an executable page of memory
  • writing the syscall opcode to it
  • setting up registers according to the os syscall convention
  • single stepping over the syscall

The advantage of this approach over calling the mmap function is that
this works even in case the mmap function is buggy or unavailable. The
disadvantage is it is more platform-dependent, which is why this patch
only works on X86 (_32 and _64) right now. Adding support for other
linux architectures should be easy and consist of defining the
appropriate syscall constants. Adding support for other OSes depends on
the its ability to do a similar trick.

Depends on D89121.

Diff Detail

Event Timeline

labath created this revision.Oct 9 2020, 5:39 AM
Herald added a project: Restricted Project. · View Herald Transcript
labath requested review of this revision.Oct 9 2020, 5:39 AM
labath added inline comments.Oct 9 2020, 5:44 AM
lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
1350–1380

How much of this would be useful for BSDs? Should some of this be pushed to the base classes?

DavidSpickett added inline comments.
lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
1450

Is it possible that the page we find is executable but not writeable and is this part intended to handle that?
(maybe this is a silly question, the code had to be written into that page in the first place I guess)

1454

Could you add a comment here explaining this? Like "either we single step just syscall, or continue then hit a trap instruction after the syscall"
(if I have that right)

The description for PTRACE_SINGLESTEP confused me a bit with "next entry to":

Restart the stopped tracee as for PTRACE_CONT, but arrange for
the tracee to be stopped at the next entry to or exit from a
system call, or after execution of a single instruction,
respectively.

But I assume what happens here is you step "syscall" and then the whole mmap call happens, then you come back. Instead of getting kept before the mmap actually happens.

1469

What does this calculation do/mean? (0x1000 is 4k which is a page size maybe?)

1471

For these ifs, are you putting {} because the return is split over two lines? I think it would compile without but is this one of the exceptions to the coding standard? (if clang-format isn't doing this for you that is)

lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_x86_64.cpp
1249

What's the source for these numbers?

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
2346

(assuming this packet type is supported by GDB already) Does it also use hex for the size field? I ask because I came across some file loading packets that were hex for lldb, int for gdb.

Obviously we don't have a hard requirement to match but if we're adding something new might as well.

2366

Is no permissions here a valid packet? (as long as mmap doesn't mind I guess it is)

lldb/test/API/lang/c/stepping/TestStepAndBreakpoints.py
23

This test now passes *because* we have the ability to allocate memory, correct? (I thought it was a stray change at first)

DavidSpickett added inline comments.Oct 9 2020, 7:25 AM
lldb/test/API/lang/c/stepping/TestStepAndBreakpoints.py
23

Actually shouldn't this be an expected failure on non x86 linux?

jankratochvil added inline comments.Oct 9 2020, 8:37 AM
lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
1354

Finding arbitrary address would not be safe if LLDB supports non-stop mode if implemented. It also makes the address a bit non-deterministic. IIRC GDB is using executable's e_entry (="_start" if it exists). But I do not have any strong reason for that.

1494

Some sanity check 'permissions' does not have set a bit which NativeProcessLinux::AllocateMemory does not understand?

labath updated this revision to Diff 297523.Oct 12 2020, 2:44 AM

address review comments

lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
1354

Lldb also uses the program entry point for the "call mmap(...)" approach of allocating memory. That is a pretty good choice when one needs to find a line of code that will not be executed during normal program operation (not stop or otherwise). However:

  • here we don't need that requirement (for all-stop mode, anyway), as we're not calling any function (even mmap can execute a fair amount of code). We're just executing a single instruction.
  • technically, there's nothing preventing the application from unmapping the entry point address, or reusing it for something else. If I wanted a foolproof solution, I'd still need to implement a fallback algorithm (as well as the code to search for the entry point, as we right now don't have anything that lldb-server could use).

Given that, it seems better to just go for the more complete solution straight away. It's true that this makes the address harder to predict (I'm trying not to use the word nondeterministic, because that's not really it), but such is life. And coming up with a non-stop-compatible solution for this is so complicated that I'd leave this for some other time (worst case: we disable this feature, or force a temporary stop of all threads).

1450

Ptrace bypasses normal write-protection checks (~all code is not writable these days, and we wouldn't be able to set breakpoints otherwise), so checking for writable addresses would not be useful.

1454

Yes, that's exactly what happens. Ptrace can't step "into" the kernel. The reason that description is confusing is because it a description of both PTRACE_SINGLESTEP *and* PTRACE_SYSCALL, with the latter stopping before the syscall.

I've considered using a sequence of two PTRACE_SYSCALLs to avoid the trap instruction requirement, but it wasn't clear to me that this would help in any way (and it would make the code longer).

1469

This is the linux syscall convention for returning errors. The fact that it matches the page size is probably not accidental, though it was not strictly necessary.

I've added a short comment to elaborate.

1471

clang-format never adds new tokens -- it just reformats existing ones.

Regarding the coding standard, this is in a fairly grey area. I could invoke the "braces should be used when a single-statement body is complex enough that it becomes difficult to see where the block containing the following statement began" rule and say this statement is "complex enough" (I think it was even more complex when I've first written it). I'm sure some people wouldn't consider it complex, but I think it's better to err on the side of adding more braces (I generally add braces for all multi-line statements).

1494

A new permission flag seems fairly unlikely, but sure, why not..

lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_x86_64.cpp
1249

There are plenty of ways to find this -- internet is full of linux system call tables. I just preprocessed a file referencing SYS_MMAP/MUNMAP constants. :P

I also added a note about the file they can be found in (/usr/include/asm/unistd.h)

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
2346

This isn't adding a completely new packet (you'll notice the patch does not include any clientside changes). This packet is already supported by debugserver, and I believe it is an lldb addition -- I did not come across anything like it in the gdb docs.

However, I would be interested to hear about the protocol mismatches you've found.

2366

Yeah, you can do that in theory -- permissions can be later changed with mprotect(2). Lldb does not make any use of that though.

lldb/test/API/lang/c/stepping/TestStepAndBreakpoints.py
23

This is a funny one. The reason this test fails is because of the stop-id check on line 148 (IIRC). Without this feature lldb will allocate memory by finding and calling the mmap libc function. This seems to mess up the "expression stop id" counter. So it's a real bug, but not one that is likely to cause problems for most people.

However, you're right that this "bug" is not "fixed" only on x86.

In file included from /home/jkratoch/redhat/llvm-monorepo/lldb/source/Host/common/NativeProcessProtocol.cpp:9:
/home/jkratoch/redhat/llvm-monorepo/lldb/include/lldb/Host/common/NativeProcessProtocol.h:20:10: fatal error: 'lldb/Utility/UnimplementedError.h' file not found
#include "lldb/Utility/UnimplementedError.h"

In file included from /home/jkratoch/redhat/llvm-monorepo/lldb/source/Host/common/NativeProcessProtocol.cpp:9:
/home/jkratoch/redhat/llvm-monorepo/lldb/include/lldb/Host/common/NativeProcessProtocol.h:20:10: fatal error: 'lldb/Utility/UnimplementedError.h' file not found
#include "lldb/Utility/UnimplementedError.h"

That's because you need the dependant revision D89121 (which is now merged).

jankratochvil accepted this revision.Oct 12 2020, 12:54 PM
jankratochvil added inline comments.
lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
1362

There does not need to exist a thread with TID equal to the process PID. That TID could already exit while other TIDs of that PID may be still running.
https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=432b4d03ad0f23970315e9f9dec080ab4a9ab94b
simplified and deGPLed as: https://people.redhat.com/jkratoch/leader-exit2.C
Just currently LLDB server cannot attach to such TID as:

lldb-server g --attach >TID< :1234
ptrace(PTRACE_ATTACH, >PID<) = -1 EPERM (Operation not permitted)
write(2, "error: failed to attach to pid >TID<: Operation not permitted\n", 64) = 64

But if this bug was fixed then I think this statement could SEGV (unless LLDB will have some internal zombie TID==PID).

lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_x86_64.cpp
1237

static const ... (4x)

This revision is now accepted and ready to land.Oct 12 2020, 12:54 PM
labath marked 12 inline comments as done.Oct 13 2020, 8:52 AM
labath added inline comments.
lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
1362

Thanks for catching that. I had a vague feeling that it might be possible for the process to outlive its main thread, but I didn't know how. I've changed this to GetCurrentThread which should be always present.

labath updated this revision to Diff 297876.Oct 13 2020, 8:56 AM
  • GetCurrentThread
  • const
jankratochvil accepted this revision.Oct 13 2020, 9:02 AM
This revision was landed with ongoing or failed builds.Oct 14 2020, 6:02 AM
This revision was automatically updated to reflect the committed changes.

It looks like this changed caused a failure on the windows bot and your follow up change to fix the test left it still broken. Can you have a look?

It looks like the windows lldb-server implementation does not support all permission combinations (ConvertLldbToWinApiProtect in ProcessDebugger.cpp). I've xfailed the test for now -- I'll try to add the missing bits tomorrow.

I've also messed up the decorators on one of the tests.

a1ab2b77 should get that green again, I hope.