lldb attach operation fails when the ptrace() call fails on a traced
process. The error reporting in this scenario could be improved by
including what went wrong and what could be done by the user to resolve
the situation.
Details
- Reviewers
jingham rupprecht DavidSpickett
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This patch is supposed to fix this Bug : 39166
I compiled the code and used the binary with local/remote debugging scenarios. On attach failure this is the output it produces :-
$ ./lldb -p 46503 (lldb) process attach --pid 46503 Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. error: attach failed: Operation not permitted (lldb)
My queries :-
- Is AppendMessage() really the right function to use in an error scenario ? or the additional message should just be appended to AppendErrorWithFormat() ?
- The error object is being returned by the Attach() call on line:399 so, should this message be handled by the Attach() function and not by DoExecute() ?
Thanks for looking at this, I've been tripped up by this myself.
Is AppendMessage() really the right function to use in an error scenario ? or the additional message should just be appended to AppendErrorWithFormat() ?
It changes whether the output goes to stdout or stderr and whether we set the return status to failed. So I would merge it into one call to AppendErrorWithFormat.
The error object is being returned by the Attach() call on line:399 so, should this message be handled by the Attach() function and not by DoExecute() ?
Go to https://lldb.llvm.org/python_reference/index.html, search for SBTarget and you'll see a few attach methods there. They call Target Attach so if you added it there they'd also get it. Which seems like a good thing to me. (but see my other comment about platforms)
lldb/source/Commands/CommandObjectProcess.cpp | ||
---|---|---|
416 | While this is useful for Linux/Unixish things, lldb also runs on Windows as well where this message isn't going to make a lot of sense. Adding a if linux ... at this level violates the separation of concerns though. Would it be possible to move this message further down into Target::Attach to a place where we know the target kind? (that would also answer your question about what level it should be at) If you could return a status from that level with this message then you'd only see it where it makes sense. |
Congrats on getting started on your first patch! I improving this error message really seems like a good idea.
From what I can see the error message here is identical to GDB's which is a different project with an incompatible license. No idea if this is large enough of a copy to bring us into the realm of copyright (not a lawyer), but I think formulating our own (maybe even better?) error message would anyway be a good idea. What about something along those lines:
error: attach failed: <Whatever error we already would return here> (This line is just the normal LLDB attach error) Note that attaching might have failed due to the ptrace_scope security policy which restricts debuggers from attaching to other processes. See the ptrace_scope documentation for more information: https://www.kernel.org/doc/Documentation/security/Yama.txt The current ptrace_scope policy can be found here: /proc/sys/kernel/yama/ptrace_scope
(Not sure how I feel about linking to some internet URL, but I couldn't find any man page for Yama/ptrace_scope)
Also I wonder how we could make sure we emit this diagnostic in cases where the ptrace_scope is actually the reason for the failed attach. The proper way to check this seems to be checking the errno after we call ptrace and then propagate the error all the way back to lldb from lldb-server. From the lldb side I don't think we have any way of knowing why the attach actually failed so we would emit this error speculatively which doesn't seem ideal. It still sounds like a better solution than having just this generic error message that doesn't help anyone, so I think David's suggestion + a FIXME is maybe a good compromise here.
It'd also be helpful to include the actual command to enable it, i.e. either echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope or sudo sysctl -w kernel.yama.ptrace_scope=0 (I think both commands are equivalent)
(Not sure how I feel about linking to some internet URL, but I couldn't find any man page for Yama/ptrace_scope)
It's part of the ptrace man page: https://manpages.debian.org/buster/manpages-dev/ptrace.2.en.html#/proc/sys/kernel/yama/ptrace_scope (I think a URL is probably fine though... maybe we could put one on LLDB's page if we're worried about kernel.org not having a stable URL)
Also I wonder how we could make sure we emit this diagnostic in cases where the ptrace_scope is actually the reason for the failed attach. The proper way to check this seems to be checking the errno after we call ptrace and then propagate the error all the way back to lldb from lldb-server. From the lldb side I don't think we have any way of knowing why the attach actually failed so we would emit this error speculatively which doesn't seem ideal. It still sounds like a better solution than having just this generic error message that doesn't help anyone, so I think David's suggestion + a FIXME is maybe a good compromise here.
Can we have LLDB read the value of /proc/sys/kernel/yama/ptrace_scope, and only print the error if the file exists and is not 0?
Thanks a lot for passing on this knowledge.
The error object is being returned by the Attach() call on line:399 so, should this message be handled by the Attach() function and not by DoExecute() ?
Go to https://lldb.llvm.org/python_reference/index.html, search for SBTarget and you'll see a few attach methods there. They call Target Attach so if you added it there they'd also get it. Which seems like a good thing to me. (but see my other comment about platforms)
Would it be possible to move this message further down into Target::Attach to a place where we know the target kind? (that would also answer your question about what level it should be at)
If you could return a status from that level with this message then you'd only see it where it makes sense.
Looks like target related information is available in TargetProperties class whom Target class is inheriting so perhaps a decision based on target-type could be made there. I will explore further into this, as in which property(s) can be used and how and have an update.
Thanks : ) !
From what I can see the error message here is identical to GDB's which is a different project with an incompatible license. No idea if this is large enough of a copy to bring us into the realm of copyright (not a lawyer), but I think formulating our own (maybe even better?) error message would anyway be a good idea. What about something along those lines:
error: attach failed: <Whatever error we already would return here> (This line is just the normal LLDB attach error) Note that attaching might have failed due to the ptrace_scope security policy which restricts debuggers from attaching to other processes. See the ptrace_scope documentation for more information: https://www.kernel.org/doc/Documentation/security/Yama.txt The current ptrace_scope policy can be found here: /proc/sys/kernel/yama/ptrace_scope(Not sure how I feel about linking to some internet URL, but I couldn't find any man page for Yama/ptrace_scope)
@teemperor @rupprecht
Agreed ! Having something of our own is a good idea.
I will change it to what you guys are suggesting.
Reading the value of /proc/sys/kernel/yama/ptrace_scope is possible even when running as a non-root user, so user permissions would not pose a problem here.
[apoos_maximus@host]$ sysctl kernel.yama.ptrace_scope kernel.yama.ptrace_scope = 1 [apoos_maximus@host]$ cat /proc/sys/kernel/yama/ptrace_scope 1
I will check into the ptrace-API to see if there is a direct way for this, else look into some other kernel call through which we can read this value.
moved the error message building logic to Target::Attach()
where the ptrace policy related mesasge gets appended only
if the debugee process is running on LinuxOS
added logic to report ptrace related error only if ptrace policy
was the actual reason for failure. i.e. by reading the value of
/proc/sys/kernel/yama/ptrace_scope file if it exists, and setting the
ptrace error only if the value is not 0.
Changed the error message to be more informative, and added things,
based on the previous review.
Hi,
@DavidSpickett, @teemperor, @rupprecht
After our previous discussions, I have modified the patch.
Now :-
- OS is detected in Target::Attach(), and reason for failure is analyzed accordingly.
- ptrace_scope related error message is only emitted after checking :-
- If /proc/sys/kernel/yama/ptrace_scope file exists
- If it Exists weather it's value is non-zero.
- If Either don't satisfy user is asked try again as the root user, as only possibly left (I think..) is that user is trying to debug a process, owned by the root user. In which case setting ptrace_policy to zero alone will not help.
- Changed the Error Message to be more informative, and unique compared to gdb. I went with the kernel.org documentation as it's more generic to all linux OS es.
I have tested the above code by trying to debug a process on localhost, and also by debugging a remote process. It's works as expected.
I am yet to build this on a windows machine, and test. As I don't own a Mac machine I'll need help from someone who does.
Only the Os detection scenario needs to be tested here, when attach fails i.e. process hasn't stopped.
Although this approach seems to work I am yet to implement the error-code approach suggested by @teemperor, I have a few questions regarding the same :-
- Will we have to introduce new Error Code particulary for the ptrace_conditon?
- This logic will reside in Process::Attach() ?
- Since Error code needs to be propagated from lldb-server to lldb, how are the two interacting (I meant code wise ...) :
- In case of debugging a remote process : does local lldb, control the remote lldb through lldb-server ? have I understood it right ?
- In this case lldb/lldb-server will have to get cross-compiled shipped to the remote host and than tested using the same lldb on localhost.
- How are new Error Codes Introduced into the system ? And how are their behaviours modified, like error.AsCString()
Maybe I can help a little with the error code approach/add on.
lldb communicates to lldb-server using gdb-remote packets, you can enable logging of these with log enable gdb-remote packets. If you're running on the same machine, lldb will start an lldb-server in gdbserver mode then connect to that. Usually if the target is a different machine you'll run lldb-server in "platform" mode there and the first thing lldb will ask it to do is spawn another lldb-server in gdb-server mode, then lldb will use that. (and of course you could manually do that step and connect to it with the gdb-remote command if you want)
Here's a log of me trying to attach to a process that GDB is already debugging:
(lldb) process attach -p 833700 <...> b-remote.async> < 17> send packet: $vAttach;cb8a4#98 b-remote.async> < 54> read packet: $E01;4f7065726174696f6e206e6f74207065726d6974746564#83 error: attach failed: Operation not permitted Try again as the root user
The first 90% is negotiating capabilities of the server/client. The important bit here is at the end.
We're asking to attach to 833700 in hex then we get a reply with an error code 1 and an error string encoded as hex chars.
The 1 is the ptrace errno and the string is the string description of that errno.
Here's me connecting to a process that doesn't exist:
(lldb) log enable gdb-remote packets (lldb) process attach -p 123456 <...> b-remote.async> < 17> send packet: $vAttach;1e240#32 b-remote.async> < 70> read packet: $Eff;43616e6e6f74206765742070726f6365737320617263686974656374757265#d4 error: attach failed: Cannot get process architecture Try again as the root user
Here you could argue the "Try again" doesn't help. Also us not just saying "no such process" doesn't help but you get the idea.
The proper way to check this seems to be checking the errno after we call ptrace and then propagate the error all the way back to lldb from lldb-server.
So in fact we *are* doing this but there are some situations (I just found the one above) where either we never get to calling ptrace or the errno is overwritten by some other call we make.
(0xff == -1 which I think is inconvertible_error_code which is what you use when you just want to make a generic error with some string)
What you could do is just look for the "Operation not permitted" case, and only show the message then. By the time you get the status in the lldb it won't have eErrorTypePosix but if you already know in that specific scenario that it will be, you can just compare to the errno code.
(be wary of including linux specific headers though, there's probably an llvm header you could use to make it generic)
There might be path where we return 1 for non ptrace reasons but it would narrow it down.
lldb/source/Target/Target.cpp | ||
---|---|---|
3121 ↗ | (On Diff #370960) | Make this function static since it's only used in this file. |
3143 ↗ | (On Diff #370960) | I get warnings like this with clang-12: /home/david.spickett/llvm-project/lldb/source/Target/Target.cpp:3144:11: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings] "Note that attaching might have failed due to the ptrace_scope " ^ I think just making status_message const char* will fix that. |
This seems like a good idea, but the functionality belongs in the Linux Platform, not in the Target.cpp. We try really hard to keep platform specific details out of the generic files.
lldb/source/Target/Target.cpp | ||
---|---|---|
3121 ↗ | (On Diff #370960) | This function does not belong in Target.cpp. This is Platform specific functionality and so belongs in the Linux Platform.. You need to add a "GetLaunchErrorDetails" or some better name for the general API, and implement that for the Unix systems that support proc, or only Linux if this functionality is only on Linux. |
clang-format: please reformat the code