This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Improve error message when "lldb attach" fails
Needs RevisionPublic

Authored by apoos-maximus on Jul 17 2021, 2:14 PM.

Details

Summary

lldb attach operation fails when the ptrace() call fails on a traced
process. The error reporting in this scenario could be improved by
including what went wrong and what could be done by the user to resolve
the situation.

Diff Detail

Event Timeline

apoos-maximus requested review of this revision.Jul 17 2021, 2:14 PM
apoos-maximus created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptJul 17 2021, 2:14 PM

This patch is supposed to fix this Bug : 39166
I compiled the code and used the binary with local/remote debugging scenarios. On attach failure this is the output it produces :-

$ ./lldb -p 46503
(lldb) process attach --pid 46503
Could not attach to process. If your uid matches the uid of the target process, 
check the setting of /proc/sys/kernel/yama/ptrace_scope, 
or try again as the root user.
error: attach failed: Operation not permitted
(lldb) 

My queries :-

  • Is AppendMessage() really the right function to use in an error scenario ? or the additional message should just be appended to AppendErrorWithFormat() ?
  • The error object is being returned by the Attach() call on line:399 so, should this message be handled by the Attach() function and not by DoExecute() ?

changed the code-styling to comply with pre-merge checks.

realigned patch against main branch

reformatted code to comply with pre-merge checks

Thanks for looking at this, I've been tripped up by this myself.

Is AppendMessage() really the right function to use in an error scenario ? or the additional message should just be appended to AppendErrorWithFormat() ?

It changes whether the output goes to stdout or stderr and whether we set the return status to failed. So I would merge it into one call to AppendErrorWithFormat.

The error object is being returned by the Attach() call on line:399 so, should this message be handled by the Attach() function and not by DoExecute() ?

Go to https://lldb.llvm.org/python_reference/index.html, search for SBTarget and you'll see a few attach methods there. They call Target Attach so if you added it there they'd also get it. Which seems like a good thing to me. (but see my other comment about platforms)

lldb/source/Commands/CommandObjectProcess.cpp
416

While this is useful for Linux/Unixish things, lldb also runs on Windows as well where this message isn't going to make a lot of sense. Adding a if linux ... at this level violates the separation of concerns though.
(yes an experienced user could ignore it but they'll probably at least wonder if lldb is confused)

Would it be possible to move this message further down into Target::Attach to a place where we know the target kind? (that would also answer your question about what level it should be at)

If you could return a status from that level with this message then you'd only see it where it makes sense.

Congrats on getting started on your first patch! I improving this error message really seems like a good idea.

From what I can see the error message here is identical to GDB's which is a different project with an incompatible license. No idea if this is large enough of a copy to bring us into the realm of copyright (not a lawyer), but I think formulating our own (maybe even better?) error message would anyway be a good idea. What about something along those lines:

error: attach failed: <Whatever error we already would return here> (This line is just the normal LLDB attach error)
Note that attaching might have failed due to the ptrace_scope security policy
which restricts debuggers from attaching to other processes. See
the ptrace_scope documentation for more information:
  https://www.kernel.org/doc/Documentation/security/Yama.txt
The current ptrace_scope policy can be found here:
  /proc/sys/kernel/yama/ptrace_scope

(Not sure how I feel about linking to some internet URL, but I couldn't find any man page for Yama/ptrace_scope)

Also I wonder how we could make sure we emit this diagnostic in cases where the ptrace_scope is actually the reason for the failed attach. The proper way to check this seems to be checking the errno after we call ptrace and then propagate the error all the way back to lldb from lldb-server. From the lldb side I don't think we have any way of knowing why the attach actually failed so we would emit this error speculatively which doesn't seem ideal. It still sounds like a better solution than having just this generic error message that doesn't help anyone, so I think David's suggestion + a FIXME is maybe a good compromise here.

Congrats on getting started on your first patch! I improving this error message really seems like a good idea.

From what I can see the error message here is identical to GDB's which is a different project with an incompatible license. No idea if this is large enough of a copy to bring us into the realm of copyright (not a lawyer), but I think formulating our own (maybe even better?) error message would anyway be a good idea. What about something along those lines:

error: attach failed: <Whatever error we already would return here> (This line is just the normal LLDB attach error)
Note that attaching might have failed due to the ptrace_scope security policy
which restricts debuggers from attaching to other processes. See
the ptrace_scope documentation for more information:
  https://www.kernel.org/doc/Documentation/security/Yama.txt
The current ptrace_scope policy can be found here:
  /proc/sys/kernel/yama/ptrace_scope

It'd also be helpful to include the actual command to enable it, i.e. either echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope or sudo sysctl -w kernel.yama.ptrace_scope=0 (I think both commands are equivalent)

(Not sure how I feel about linking to some internet URL, but I couldn't find any man page for Yama/ptrace_scope)

It's part of the ptrace man page: https://manpages.debian.org/buster/manpages-dev/ptrace.2.en.html#/proc/sys/kernel/yama/ptrace_scope (I think a URL is probably fine though... maybe we could put one on LLDB's page if we're worried about kernel.org not having a stable URL)

Also I wonder how we could make sure we emit this diagnostic in cases where the ptrace_scope is actually the reason for the failed attach. The proper way to check this seems to be checking the errno after we call ptrace and then propagate the error all the way back to lldb from lldb-server. From the lldb side I don't think we have any way of knowing why the attach actually failed so we would emit this error speculatively which doesn't seem ideal. It still sounds like a better solution than having just this generic error message that doesn't help anyone, so I think David's suggestion + a FIXME is maybe a good compromise here.

Can we have LLDB read the value of /proc/sys/kernel/yama/ptrace_scope, and only print the error if the file exists and is not 0?

It changes whether the output goes to stdout or stderr and whether we set the return status to failed. So I would merge it into one call to AppendErrorWithFormat.

Thanks a lot for passing on this knowledge.

The error object is being returned by the Attach() call on line:399 so, should this message be handled by the Attach() function and not by DoExecute() ?

Go to https://lldb.llvm.org/python_reference/index.html, search for SBTarget and you'll see a few attach methods there. They call Target Attach so if you added it there they'd also get it. Which seems like a good thing to me. (but see my other comment about platforms)

Would it be possible to move this message further down into Target::Attach to a place where we know the target kind? (that would also answer your question about what level it should be at)

If you could return a status from that level with this message then you'd only see it where it makes sense.

Looks like target related information is available in TargetProperties class whom Target class is inheriting so perhaps a decision based on target-type could be made there. I will explore further into this, as in which property(s) can be used and how and have an update.

apoos-maximus added a comment.EditedJul 21 2021, 3:51 PM

Congrats on getting started on your first patch! I improving this error message really seems like a good idea.

Thanks : ) !

From what I can see the error message here is identical to GDB's which is a different project with an incompatible license. No idea if this is large enough of a copy to bring us into the realm of copyright (not a lawyer), but I think formulating our own (maybe even better?) error message would anyway be a good idea. What about something along those lines:

error: attach failed: <Whatever error we already would return here> (This line is just the normal LLDB attach error)
Note that attaching might have failed due to the ptrace_scope security policy
which restricts debuggers from attaching to other processes. See
the ptrace_scope documentation for more information:
  https://www.kernel.org/doc/Documentation/security/Yama.txt
The current ptrace_scope policy can be found here:
  /proc/sys/kernel/yama/ptrace_scope

(Not sure how I feel about linking to some internet URL, but I couldn't find any man page for Yama/ptrace_scope)

It'd also be helpful to include the actual command to enable it, i.e. either echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope or sudo sysctl -w kernel.yama.ptrace_scope=0 (I think both commands are equivalent)

(Not sure how I feel about linking to some internet URL, but I couldn't find any man page for Yama/ptrace_scope)

It's part of the ptrace man page: https://manpages.debian.org/buster/manpages-dev/ptrace.2.en.html#/proc/sys/kernel/yama/ptrace_scope (I think a URL is probably fine though... maybe we could put one on LLDB's page if we're worried about kernel.org not having a stable URL)

@teemperor @rupprecht
Agreed ! Having something of our own is a good idea.
I will change it to what you guys are suggesting.

Also I wonder how we could make sure we emit this diagnostic in cases where the ptrace_scope is actually the reason for the failed attach. The proper way to check this seems to be checking the errno after we call ptrace and then propagate the error all the way back to lldb from lldb-server. From the lldb side I don't think we have any way of knowing why the attach actually failed so we would emit this error speculatively which doesn't seem ideal.

Can we have LLDB read the value of /proc/sys/kernel/yama/ptrace_scope, and only print the error if the file exists and is not 0?

Reading the value of /proc/sys/kernel/yama/ptrace_scope is possible even when running as a non-root user, so user permissions would not pose a problem here.

[apoos_maximus@host]$ sysctl kernel.yama.ptrace_scope       
kernel.yama.ptrace_scope = 1
[apoos_maximus@host]$ cat /proc/sys/kernel/yama/ptrace_scope       
1

I will check into the ptrace-API to see if there is a direct way for this, else look into some other kernel call through which we can read this value.

moved the error message building logic to Target::Attach()
where the ptrace policy related mesasge gets appended only
if the debugee process is running on LinuxOS

apoos-maximus marked an inline comment as done.Sep 4 2021, 3:09 PM

added logic to report ptrace related error only if ptrace policy
was the actual reason for failure. i.e. by reading the value of
/proc/sys/kernel/yama/ptrace_scope file if it exists, and setting the
ptrace error only if the value is not 0.

Changed the error message to be more informative, and added things,
based on the previous review.

Hi,
@DavidSpickett, @teemperor, @rupprecht

After our previous discussions, I have modified the patch.
Now :-

  1. OS is detected in Target::Attach(), and reason for failure is analyzed accordingly.
  2. ptrace_scope related error message is only emitted after checking :-
    • If /proc/sys/kernel/yama/ptrace_scope file exists
    • If it Exists weather it's value is non-zero.
    • If Either don't satisfy user is asked try again as the root user, as only possibly left (I think..) is that user is trying to debug a process, owned by the root user. In which case setting ptrace_policy to zero alone will not help.
  3. Changed the Error Message to be more informative, and unique compared to gdb. I went with the kernel.org documentation as it's more generic to all linux OS es.

I have tested the above code by trying to debug a process on localhost, and also by debugging a remote process. It's works as expected.
I am yet to build this on a windows machine, and test. As I don't own a Mac machine I'll need help from someone who does.
Only the Os detection scenario needs to be tested here, when attach fails i.e. process hasn't stopped.

Although this approach seems to work I am yet to implement the error-code approach suggested by @teemperor, I have a few questions regarding the same :-

  1. Will we have to introduce new Error Code particulary for the ptrace_conditon?
  2. This logic will reside in Process::Attach() ?
  3. Since Error code needs to be propagated from lldb-server to lldb, how are the two interacting (I meant code wise ...) :
    • In case of debugging a remote process : does local lldb, control the remote lldb through lldb-server ? have I understood it right ?
    • In this case lldb/lldb-server will have to get cross-compiled shipped to the remote host and than tested using the same lldb on localhost.
  4. How are new Error Codes Introduced into the system ? And how are their behaviours modified, like error.AsCString()
This comment was removed by apoos-maximus.

Maybe I can help a little with the error code approach/add on.

lldb communicates to lldb-server using gdb-remote packets, you can enable logging of these with log enable gdb-remote packets. If you're running on the same machine, lldb will start an lldb-server in gdbserver mode then connect to that. Usually if the target is a different machine you'll run lldb-server in "platform" mode there and the first thing lldb will ask it to do is spawn another lldb-server in gdb-server mode, then lldb will use that. (and of course you could manually do that step and connect to it with the gdb-remote command if you want)

Here's a log of me trying to attach to a process that GDB is already debugging:

(lldb) process attach -p 833700
<...>
b-remote.async>  <  17> send packet: $vAttach;cb8a4#98
b-remote.async>  <  54> read packet: $E01;4f7065726174696f6e206e6f74207065726d6974746564#83
error: attach failed: Operation not permitted
Try again as the root user

The first 90% is negotiating capabilities of the server/client. The important bit here is at the end.
We're asking to attach to 833700 in hex then we get a reply with an error code 1 and an error string encoded as hex chars.

The 1 is the ptrace errno and the string is the string description of that errno.

Here's me connecting to a process that doesn't exist:

(lldb) log enable gdb-remote packets
(lldb) process attach -p 123456
<...>
b-remote.async>  <  17> send packet: $vAttach;1e240#32
b-remote.async>  <  70> read packet: $Eff;43616e6e6f74206765742070726f6365737320617263686974656374757265#d4
error: attach failed: Cannot get process architecture
Try again as the root user

Here you could argue the "Try again" doesn't help. Also us not just saying "no such process" doesn't help but you get the idea.

The proper way to check this seems to be checking the errno after we call ptrace and then propagate the error all the way back to lldb from lldb-server.

So in fact we *are* doing this but there are some situations (I just found the one above) where either we never get to calling ptrace or the errno is overwritten by some other call we make.
(0xff == -1 which I think is inconvertible_error_code which is what you use when you just want to make a generic error with some string)

What you could do is just look for the "Operation not permitted" case, and only show the message then. By the time you get the status in the lldb it won't have eErrorTypePosix but if you already know in that specific scenario that it will be, you can just compare to the errno code.
(be wary of including linux specific headers though, there's probably an llvm header you could use to make it generic)

There might be path where we return 1 for non ptrace reasons but it would narrow it down.

lldb/source/Target/Target.cpp
3121 ↗(On Diff #370960)

Make this function static since it's only used in this file.

3143 ↗(On Diff #370960)

I get warnings like this with clang-12:

/home/david.spickett/llvm-project/lldb/source/Target/Target.cpp:3144:11: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings]
          "Note that attaching might have failed due to the ptrace_scope "
          ^

I think just making status_message const char* will fix that.

jingham requested changes to this revision.Sep 7 2021, 11:51 AM

This seems like a good idea, but the functionality belongs in the Linux Platform, not in the Target.cpp. We try really hard to keep platform specific details out of the generic files.

lldb/source/Target/Target.cpp
3121 ↗(On Diff #370960)

This function does not belong in Target.cpp. This is Platform specific functionality and so belongs in the Linux Platform.. You need to add a "GetLaunchErrorDetails" or some better name for the general API, and implement that for the Unix systems that support proc, or only Linux if this functionality is only on Linux.

This revision now requires changes to proceed.Sep 7 2021, 11:51 AM