This is an archive of the discontinued LLVM Phabricator instance.

[android/process info] Introduce display_name
ClosedPublic

Authored by wallace on Oct 14 2019, 7:42 PM.

Details

Summary

On systems like android, there are cases in which a process name can correspond to a package name (e.g. com.test.app), which is not an executable path, but instead a bundle id. In other cases, the names can correspond to special system processes whose executable path is not available (e.g. [intr_syncd]) or even kernel threads (e.g. [kworker/6:3]). So far ProcessInfo has been assuming that the process name is an executable path and for cases without executable path, no process name has been shown when invoking platform process list.
Now I'm adding a display_name field whose sole purpose is be the name to be shown when dumping ProcessInfo in cases like platform process list.
In the specific case of android, the bundle id is Arg0 when /proc/<pid>/exe is unreadable, so we can use it as display_name. For caess like system processes or kernel threads, the name is in /proc/<pid>/stat.

After this change, apk packages, system processes, and kernel threads appear in the process list

PID PARENT USER TRIPLE NAME

1 0 root /init
2 0 root [kthreadd]
3 2 root [ksoftirqd/0]
5 2 root [kworker/0:0H]
...
26176 2 root [irq/145-arm-smm]
26177 2 root [irq/146-arm-smm]
26308 2 root [kworker/2:5]
26598 926 u0_a111 com.samsung.android.app.cocktailbarservice:edgelighting
26758 926 u0_a102 com.samsung.android.messaging
26959 926 u0_a116 com.asurion.android.verizon.vms:DefendServiceVzSecure
30407 926 u0_a13 android.process.acore
31011 926 u0_a54 android.process.media
32209 926 u0_a223 com.facebook.katana

Event Timeline

wallace created this revision.Oct 14 2019, 7:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2019, 7:42 PM
wallace planned changes to this revision.Oct 14 2019, 11:14 PM

I think it would be nice to have a new member to ProcessInfo like:

std::string m_bundle_id;

This notion works for MacOS and for Android. When launching on iOS, we need to application bundle identifier to do the launching as the install of the app registers the app with the OS app launcher and is launched using its bundle identifier.

I would like to see the m_executable always contain the path to "app_process" binary. Arg0 is currently used when spawning via posix_spawn or fork/exec to control the name of the binary that is actually passed on the command line. For example you might specify "clang++" as your binary. If we resolved this to a binary we would find it resolves to "clang++@ -> clang", but we want to pass in clang++ as the arg0 so the compiler knows to run in C++ mode. So I would like Arg0 to be the actual argument that the client would like to be passed to the executable. Then m_bundle_id can be used by the platform launching code to launch the app. Also, by having m_executable set to "app_process", we can create a valid target in LLDB using the ProcessInfo and then launch/attach.

wallace updated this revision to Diff 225107.Oct 15 2019, 1:41 PM

now using bundle_id

wallace retitled this revision from [android/process list] use arg0 as fallback for process name to [android/process info] Introduce bundle id.Oct 15 2019, 1:43 PM
wallace edited the summary of this revision. (Show Details)
wallace updated this revision to Diff 225112.Oct 15 2019, 1:44 PM
wallace edited the summary of this revision. (Show Details)

nit

Harbormaster completed remote builds in B39602: Diff 225112.
Harbormaster completed remote builds in B39603: Diff 225113.

LGTM. Pavel?

labath added subscribers: danalbert, jingham.

Introducing a "bundle" identifier as a first class concept sounds reasonable to me, particularly if that concept can be applied to more than one platform. But since we're talking about iOS, it would be good to have the apple folks sign off on this idea too (+@jingham).

Independently, I am wondering if there's a better way to link the process id to a bundle. Using argv[0] might be ok if we're using it just for display purposes, but if we're going to be doing other stuff based on that identifier, it would be better to get it from a more reliable source. Unfortunately, I was not able to find a more "reasonable source", but maybe @danalbert has an idea.

I would like to see the m_executable always contain the path to "app_process" binary. Arg0 is currently used when spawning via posix_spawn or fork/exec to control the name of the binary that is actually passed on the command line. For example you might specify "clang++" as your binary. If we resolved this to a binary we would find it resolves to "clang++@ -> clang", but we want to pass in clang++ as the arg0 so the compiler knows to run in C++ mode. So I would like Arg0 to be the actual argument that the client would like to be passed to the executable. Then m_bundle_id can be used by the platform launching code to launch the app. Also, by having m_executable set to "app_process", we can create a valid target in LLDB using the ProcessInfo and then launch/attach.

FWIW, even in the previous implementation the idea was the "executable" field would point to the real exe (or be empty, if we can't locate it). And the argv array would still contain the best approximation of the arguments process was invoked with -- only an approximation, because on linux it's possible (and relatively common) for a process to overwrite it's argv[0] to alter the "ps" output. This is what we were trying to emulate by having GetName return argv[0] if the executable path was unknown.

lldb/source/Host/linux/Host.cpp
254–256

How sure are we that the app processes are the only ones which have the exe link unreadable? Will that be true on all phones or just the recent ones with all the selinux stuff?

I was hoping that there is some more reliable way of fetching this information, but there doesn't seem to be anything except ActivityManager.RunningAppProcessInfo, which I don't know if it can be accessed from lldb, either host- or device-side.

@danalbert, any ideas here?

lldb/source/Utility/ProcessInfo.cpp
45–46

What should be the order of preference here? It seems like it would be more natural to use "bundle id" as the "name" even if the executable path happens to be available/readable.

wallace marked an inline comment as done.Oct 16 2019, 11:16 AM
wallace added inline comments.
lldb/source/Host/linux/Host.cpp
254–256

Another option I've just discovered is to invoke pm list packages to get the list of all apks and if an Arg0 is in that list, then it's a package.
That seems good enough

wallace planned changes to this revision.Oct 16 2019, 11:19 AM

As Greg said, iOS (and macOS as well, though less directly) have the notion of bundleID. At present, lldb doesn't directly use/figure out the bundle ID, though it could either from the binary itself or from debugserver, which does have to know that. As far as I know we always also have the path, the bundle ID is inferred from the App package the binary path points to. So that complication should not be present on iOS.

Anyway, I don't see any problem with fetching and displaying the bundleID.

I don't think it would be good to make it hard to see the actual path (if you have it) as well as the bundle ID. If you are working on system components, you want to know that you are running some hand-built copy of a binary and not the pre-installed one. So we want to make it easy to see "which thing with this bundleID am I actually running". Other than that this seems fine to me.

I don't think it would be good to make it hard to see the actual path (if you have it) as well as the bundle ID. If you are working on system components, you want to know that you are running some hand-built copy of a binary and not the pre-installed one. So we want to make it easy to see "which thing with this bundleID am I actually running". Other than that this seems fine to me.

Thanks, Jim. This part is interesting. On android the application "path" is mostly uninteresting, because it will always be "/system/bin/app_proces" (as the "app" is really just a shared library loaded into that process), and AFAIK it is not even possible to run an application (an least on an unrooted phone) from a different location.

What would you say is the right way to display this in the "platform process list" output? Would it be ok if in non-verbose mode we displayed only one of these things (bundle id, path base name, argv[0], probably in that order), and then for the non-verbose mode listed each thing separately?

lldb/source/Host/linux/Host.cpp
254–256

That should prevent us accidentally setting an incorrect bundle id, but it does not prevent a process from deliberately changing its argv[0] to the name of some other installed package. That seems suboptimal, particularly if we're later going to use start using the bundle id for other than just display purposes (e.g. for issuing "am kill" commands). So, I am still wondering if we shouldn't go back to using argv[0] for the purpose of "process list", and leave the "bundle id" field for cases where we can get this information from a more reliable source (e.g. reading it from the apk, or fetching it from the android package manager, etc). What do you think?

danalbert added a subscriber: enh.Oct 17 2019, 12:50 PM

Independently, I am wondering if there's a better way to link the process id to a bundle. Using argv[0] might be ok if we're using it just for display purposes, but if we're going to be doing other stuff based on that identifier, it would be better to get it from a more reliable source. Unfortunately, I was not able to find a more "reasonable source", but maybe @danalbert has an idea.

@enh might

lldb/source/Host/linux/Host.cpp
254–256

Thanks for your feedback and to @clayborg for an offline discussion we had.

I'm going to add three new attributes to the ProcessInfo class, with corresponding parameters in the gdb-remote packet

  • display_name: This is a custom display name that the server can set for a given process. This will supersede any other other attribute in the GetProcessName call. There are interesting cases that are worth mentioning for Android:
    • Kernel threads and system processes are displayed like [kworker/u:0] [khubd] by ps on Android. At least on modern devices ps.c identify those processes as the ones with unreadable /proc/<pid>/cmdline and unreadable /proc/<pid>/exe. The name is gotten from /proc/<pid>/stat. There's some generic information about this here https://unix.stackexchange.com/questions/22121/what-do-the-brackets-around-processes-mean. In this case both the actual executable and args are not known, so it's better to leave those fields empty and store the display name in this new field.
    • Custom apk process names: apks can launch processes and subprocesses with custom names. The custom name has the form com.example.app:custom_name. It's always that way and it's documented here https://developer.android.com/guide/topics/manifest/application-element#proc (go to the android:process section). A given apk can have several subprocesses, each one with a different custom name, e.g. com.samsung.services:Core, com.samsung.services:Monitor, etc. In this case, the package name is known and a display_name field would be useful to store the entire custom name.
  • bundle_id: This will be exactly the apk package name or the iOS bundle name inferred by the server. I plan to make a best effort guess for modern android devices, leaving old devices as cases to implement as needed. I plan to use pm list packages for this, given that this is the only reliable way to get all the apk names. Besides, I'll check /proc<pid>/stat for the process name and not arg0. It seems that /stat is not modifiable the same way arg0 is.
  • bundle_path: On android this is the path the apk path. This will also be useful for iOS, because app bundles are also stored somewhere. Fortunately, the command pm list packages -f, shows the list of apks along with their corresponding paths on disk. This command doesn't work on old devices, but still, I wouldn't prioritize them now.

For the platform process list command, I'll make the GetProcessName function look for the actual name from these sources in the given order: display_name -> bundle_id -> exe path
In the case of the --verbose mode, I'll show all bundle information and display name fields

Thank you for doing the investigation, and the detailed summary. I'm going to respond inline.

Thanks for your feedback and to @clayborg for an offline discussion we had.

I'm going to add three new attributes to the ProcessInfo class, with corresponding parameters in the gdb-remote packet

  • display_name: This is a custom display name that the server can set for a given process. This will supersede any other other attribute in the GetProcessName call. There are interesting cases that are worth mentioning for Android:
    • Kernel threads and system processes are displayed like [kworker/u:0] [khubd] by ps on Android. At least on modern devices ps.c identify those processes as the ones with unreadable /proc/<pid>/cmdline and unreadable /proc/<pid>/exe. The name is gotten from /proc/<pid>/stat. There's some generic information about this here https://unix.stackexchange.com/questions/22121/what-do-the-brackets-around-processes-mean. In this case both the actual executable and args are not known, so it's better to leave those fields empty and store the display name in this new field.
    • Custom apk process names: apks can launch processes and subprocesses with custom names. The custom name has the form com.example.app:custom_name. It's always that way and it's documented here https://developer.android.com/guide/topics/manifest/application-element#proc (go to the android:process section). A given apk can have several subprocesses, each one with a different custom name, e.g. com.samsung.services:Core, com.samsung.services:Monitor, etc. In this case, the package name is known and a display_name field would be useful to store the entire custom name.

I think this is an excellent idea. It makes it clear that this is a separate concept, and that one should not rely on this being identical to any other concept (exe path, bundle it, etc.)

  • bundle_id: This will be exactly the apk package name or the iOS bundle name inferred by the server. I plan to make a best effort guess for modern android devices, leaving old devices as cases to implement as needed. I plan to use pm list packages for this, given that this is the only reliable way to get all the apk names. Besides, I'll check /proc<pid>/stat for the process name and not arg0. It seems that /stat is not modifiable the same way arg0 is.

Having a bundle_id field in this struct seems fine. I'm much less enthusiastic about lldb-server "inferring" its contents. In fact, after reading the android docs you linked above, I'm starting to become opposed to the idea. I'll copy the interesting bits of that document:

android:process
    The name of a process where all components of the application should run. Each component can override this default by setting its own process attribute.

    By default, Android creates a process for an application when the first of its components needs to run.

The way I read this is that this gives an application a fully documented way of changing it's process name to _anything_ it wants, including the name of another package (or at least, something that looks very similar to it). The app doesn't even have to go native to scribble into some random bytes in memory (which could be considered unsupported). I also think that going for /proc/<pid>/stat is a downgrade, as the name there comes from /proc/<pid>/comm, which is a file that can be written to by anyone with appropriate permissions (as opposed to argv[0], which requires permissions to write to the memory of that process -- a harder thing to come by). It is also limited to 16 characters, so it likely will not contain the full package name.

What's the end goal here? My impression was that you're trying to to make "platform process list" output useful on android, but it seems to me like this is already achieved by having the "display name" field. Can we postpone this part until we actually need it? I'm hoping that at that point it will become clearer whether these hac^H^Heuristics are really needed, or if there's another way to achieve the same goal. For instance when running doing a "process attach PID", we can (reliably) get the user id that PID runs under, and then use "pm list" to tie that user id to a package name that can be given to the "run-as" command -- no need to guess something in the implementation of GetProcessInfo.

  • bundle_path: On android this is the path the apk path. This will also be useful for iOS, because app bundles are also stored somewhere. Fortunately, the command pm list packages -f, shows the list of apks along with their corresponding paths on disk. This command doesn't work on old devices, but still, I wouldn't prioritize them now.

Likewise, the bundle_path concept seems fine, but I have doubts about the methods used to retrieve it, particularly as having two different apps share the same process is a legitimate and recommended way of reducing the memory footprint ("By setting this attribute to a process name that's shared with another application, you can arrange for components of both applications to run in the same process").

man, thanks for your feedback!
Indeed, i was trying to solve some problems that don't exist yet. Later when I add apk debugging support, I'll figure out what's the best way to retrieve apk specific information, but for now display_name is more than enough

sounds like we have a plan!

Cool. I'm glad we could figure this out.

wallace updated this revision to Diff 225947.Oct 21 2019, 1:31 PM
wallace retitled this revision from [android/process info] Introduce bundle id to [android/process info] Introduce display_name.
wallace edited the summary of this revision. (Show Details)

now using display_name instead of bundle_id. I've also changed the diff title and description

This looks pretty good. I just have one quick question about the /comm file.

lldb/source/Host/linux/Host.cpp
205–222

This should return the same value as /proc/<pid>/comm, except that you don't need to do any fancy parsing. Are there any cases where we cannot just read the /comm file ?

wallace marked an inline comment as done.Oct 28 2019, 11:35 AM
wallace added inline comments.
lldb/source/Host/linux/Host.cpp
205–222

you are right! I'll just use it directly

wallace updated this revision to Diff 226744.Oct 28 2019, 1:33 PM

now reading from comm

labath accepted this revision.Oct 29 2019, 12:14 AM
This revision is now accepted and ready to land.Oct 29 2019, 12:14 AM
wallace closed this revision.Nov 4 2019, 9:35 AM