Page MenuHomePhabricator

[WIP][RFC] Improve fetching the process list on the android platform
AbandonedPublic

Authored by wallace on Sep 25 2019, 1:26 PM.

Details

Summary

Context:
The android platform is using the bare PlatformRemoteGDBServer to find the list of processes, which returns a very small subset of processes. For example, I got this on my machine

(lldb) platform process list
2 matching processes were found on "remote-android"
PID    PARENT USER       TRIPLE                   NAME
====== ====== ========== ======================== ============================
23291  3177              aarch64-unknown-linux-android sh
23301  23291            aarch64-unknown-linux-android lldb-server

However, "adb shell ps -A" manages to get more than 700 processes on my machine.

Problem: We should make the android platform smarter so that it can show the output of all those processes

Solution:
Given that most modern android devices, including Samsung's, support "ps -A", the least effort solution for now is to use "ps -A" as the source of processes.
For devices that don't support "ps -A", a fallback to a plain "ps" invocation is included.
And even if "ps" fails, which I remember to have seen in super old feature phones, a fallback to the bare PlatformRemoteGDBServer is being included.

The output right now looks like this:

(lldb) platform process list
713 matching processes were found on "remote-android"
PID    PARENT USER       TRIPLE                   NAME
====== ====== ========== ======================== ============================
1      0                                          init
2      0                                          [kthreadd]
3      2                                          0]
5      2                                          0:0H]
7      2                                          [rcu_preempt]
...
23500  2                                          u16:15]
26985  978                                        com.samsung.android.oneconnect:Receiver
27594  978                                        com.samsung.android.dialer
27602  978                                        com.samsung.android.contacts
29464  2                                          7:0]
29497  2                                          [intr_syncd]
29999  979                                        com.tencent.mm:push
30134  978                                        com.google.android.googlequicksearchbox:search
30260  2                                          u17:0]
30272  978                                        com.sec.spp.push
31552  2                                          3:0]
32201  2                                          u16:7]

Help needed:

There are some things still left.

  1. architecture:

I don't know if we can simply reuse an existing variable and set it for all processes, or if we should really find the architecture of each process.

  1. user id:

ps returns a user name, but ProcessInfo expects a numeric user ID. Examples of user names are u0_a306, u0_a84, root, bluetooth, etc. Generally there's a new user name when an apk runs IIRC. Should we include a user name field in ProcessInfo for these cases?

  1. process name:

ProcessInfo stores its process name as a FileSpec, which does path splitting by / or \. For some problematic system processes, ps shows a process name between brackets like [irq/159-arm-smm]. The FileSpec file will try to parse it as an actual path and the output of the process list command will just include 159-arm-smm].
I imagine that it's reasonable to discard all the processes that have names between brackets.

  1. long process list:

Should we discard system and root processes? These are a lot (hundreds on my devices) and can't be debugged.

  1. c++:

I haven't written c++ in years, and i have forgotten almost everything. Please make as many advices as possible :)

Event Timeline

wallace created this revision.Sep 25 2019, 1:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 25 2019, 1:26 PM
wallace retitled this revision from adbps to [WIP][RFC] Improve fetching the process list on the android platform.Sep 25 2019, 1:57 PM
wallace edited the summary of this revision. (Show Details)
wallace updated this revision to Diff 221837.Sep 25 2019, 2:33 PM

remove unwanted files

wallace edited the summary of this revision. (Show Details)Sep 25 2019, 2:36 PM
wallace added reviewers: clayborg, aadsm.

There are some things still left.

architecture:
I don't know if we can simply reuse an existing variable and set it for all processes, or if we should really find the architecture of each process.

The only time we might run into an issue is when we have arm32 running on arm64. Not sure if that can be detected. For now we can deduce the devices architecture and apply to all?

user id:
ps returns a user name, but ProcessInfo expects a numeric user ID. Examples of user names are u0_a306, u0_a84, root, bluetooth, etc. Generally there's a new user name when an apk runs IIRC. Should we include a user name field in ProcessInfo for these cases?

No, we should try and figure out the user ID for the user name if possible. The platform code has code to get the username for a user ID.

process name:
ProcessInfo stores its process name as a FileSpec, which does path splitting by / or \. For some problematic system processes, ps shows a process name between brackets like [irq/159-arm-smm]. The FileSpec file will try to parse it as an actual path and the output of the process list command will just include 159-arm-smm].
I imagine that it's reasonable to discard all the processes that have names between brackets.

Not sure. Maybe we can get some info from the /proc/<pid>/maps to find the file for the process? We do want to get the process' main binary in the process info. Anything that we don't want to attach to and debug can be left off this list. Not sure if that means any process those name starts with a '[' character or not?

long process list:
Should we discard system and root processes? These are a lot (hundreds on my devices) and can't be debugged.

Probably best to only show things we can attach to. Not sure how to weed this out. If the lldb-server in platform mode _is_ running as root, we probably want to show the root processes maybe? Normally we show processes that the lldb-server user ID has access to, but this falls down real quick with Android since each app has its own user ID. Not sure what the best option is here.

c++:
I haven't written c++ in years, and i have forgotten almost everything. Please make as many advices as possible :)

Will do!

Added Pavel to the review list. Pavel, please add anyone that has Android expertise to this patch!

There are some things still left.

architecture:
I don't know if we can simply reuse an existing variable and set it for all processes, or if we should really find the architecture of each process.

The only time we might run into an issue is when we have arm32 running on arm64. Not sure if that can be detected. For now we can deduce the devices architecture and apply to all?

If you ask the device for what architectures it supports, you could get a list. It might be worth figuring out what architecture your process was built to run on.

user id:
ps returns a user name, but ProcessInfo expects a numeric user ID. Examples of user names are u0_a306, u0_a84, root, bluetooth, etc. Generally there's a new user name when an apk runs IIRC. Should we include a user name field in ProcessInfo for these cases?

No, we should try and figure out the user ID for the user name if possible. The platform code has code to get the username for a user ID.

+1

process name:
ProcessInfo stores its process name as a FileSpec, which does path splitting by / or \. For some problematic system processes, ps shows a process name between brackets like [irq/159-arm-smm]. The FileSpec file will try to parse it as an actual path and the output of the process list command will just include 159-arm-smm].
I imagine that it's reasonable to discard all the processes that have names between brackets.

Not sure. Maybe we can get some info from the /proc/<pid>/maps to find the file for the process? We do want to get the process' main binary in the process info. Anything that we don't want to attach to and debug can be left off this list. Not sure if that means any process those name starts with a '[' character or not?

It might be worth not storing process names as FileSpecs? I think it could be a good idea to have a platform-specific method of finding the executable you want to attach to, even if its the case that most platforms will just have the binary as the process' main binary. Android, for example, might have the app's name as the name (com.foo.bar) meaning that FileSpec might be insufficient here.

Also, my understanding is that process names with brackets like that are generally system services or processes that the kernel has running. Most people probably can't debug these under normal circumstances.

long process list:
Should we discard system and root processes? These are a lot (hundreds on my devices) and can't be debugged.

Probably best to only show things we can attach to. Not sure how to weed this out. If the lldb-server in platform mode _is_ running as root, we probably want to show the root processes maybe? Normally we show processes that the lldb-server user ID has access to, but this falls down real quick with Android since each app has its own user ID. Not sure what the best option is here.

If you're running as root (which you might be if you have a rooted device) you probably want to see root's processes. I'm unsure if there is an actual good way of filtering processes here. Discarding system seems relatively safe to me here, but beyond that it's hard to say. I generally think lldb should err on the side of showing things that you won't need rather than not showing things that you might need.

Weeding out things that you can't attach to is going to be kind of complicated. I think that you can have heuristics that help (e.g. can't attach to root processes if you're not root) but there's probably not a nice surefire way of doing it. :(

xiaobai added inline comments.Sep 25 2019, 4:24 PM
lldb/source/Plugins/Platform/Android/PlatformAndroidRemoteGDBServer.cpp
240

You could return the PsColumsnIndices directly instead of returning the bool, then have the caller check for validity.

302

Where does 5 seconds come from here?

340

nit: remove the else

I think we should take a step back first. What is the behavior we expect from the "platform process list" command ? I think the current expectation is (and this is consistent with @clayborg's comment in https://reviews.llvm.org/D68048#1683238) is that it should only return the processes that we can attach to. The way lldb-server implements that right now is via comparing user ids https://github.com/llvm-mirror/lldb/blob/master/source/Host/linux/Host.cpp#L250. While these user ids don't really correspond to what android calls "users", they in fact do implement the "can I attach to this" semantics, as you cannot attach to a process with a different user id (without doing some serious run-as magic, which lldb-server does not know how to do right now).

So, I am not sure if there anything to change here, really... If your goal is to be able to see _all_ processes, even those you cannot attach to, then maybe we should have a separate flag/command for that. The FindProcesses API already supports being able to show all processes, but it does not look like there's a way to set that from the "platform process list" command...

We were discussing things over here and were wondering if we even need the lldb-server in order to connect to the platform, or if we should just implement everything through command line commands like "adb", or link against a ADB shared library (is there one?), or another way? There are complexities that make the lldb-server in platform mode less useful like needing to run lldb-platform as the user for the application in order to get things working.

For example, if we just want to connect to a device before we have an app we want to debug, like if we wanted to just attach to a process, we currently do:

(lldb) platform select remote-android
(lldb) platform connect tcp://1234

And the user must launch lldb-server in platform mode themselves, which is not optimal. A better experience might be to do:

(lldb) platform select remote-android
(lldb) platform connect

with no args to "platform connect" and LLDB will auto connect to a single device if only one device is available. If multiple devices are available we would need to name the device like:

(lldb) platform connect device123

where "device123" is the name or could be the serial number or what ever uniquely identifies a device.

If we use lldb-platform in server mode, what user do we launch it as if we have no app? If we use the "shell" user? If so, is the platform connection useful still once we need to upload things into an application's data directory? Or would we need to launch another lldb-server in platform mode that runs as a user? Do we currently need to run lldb-server in platform mode as the user if we want to launch a debug session?

Seems like it would be cleaner to avoid using lldb-server and just have PlatformAndroid use any command line tools ("adb", etc) to run each command and each command can pick the user that would be required. With this patch in the current form we are kind of going the route of adding to the lldb-server, but is that the right decision? I want to avoid having multiple lldb-server binaries being run as multiple different users if we can avoid it. Thoughts?

I think that being able to debug an android device without needing to first copy&run an lldb-server there would be a great feature, and it's something I've wished for many times. Whether that also means ditching the lldb-server platform process is a more complicated question.

While some of the things that we use lldb-server platform for (copying files) can be easily done with adb, this is not generally true. I am afraid that if we try to do that, then we'll end up simply not being able to access some data, or needing to do complicated text scraping (which can be flaky). While it is not _that_ bad, this patch demonstrates both problems -- fetching the process architecture in this manner is pretty easy for lldb-server, but very hard over adb, and it also needs to account for differences in the "ps" output format on different devices.

Whether that means this approach should be scrapped? I don't know... If I were doing this, I'd probably try to keep it. But that's kind of not my business now...

What I do want to suggest is to think about a solid regression testing strategy early on. One of the ideas that come to mind is to create a mock adb server in python (similar to the mock gdb server in test/testcases/functionalities/gdb_client). This would allow you to simulate a variety of android devices and their responses to different commands (e.g. ps), and it would not require a real android device present at least for the most basic tests.

wallace abandoned this revision.Sep 27 2019, 9:45 AM

This diff has created a good discussion and I have a better idea of what we should do to move forward:)

  • I figured out a way to improve the existing process fetching functionality of the lldb-server so that it can show all the nice information that ps -A is showing. I'll just do some things ps is doing under the hood with /proc files.
  • I'll add some more fields to ProcessInstanceInfo that are android-specific, like package name, which is not the same as process name.
  • I'll implement a way for lldb to push an existing local lldb-server or to reuse one already in the device. That way lldb can start lldb-server on behalf of the user for two cases: the initial platform connection and the actual apk debugging, which requires an lldb-server started with the correct user
  • I'll implement a way for lldb to debug an apk bundle