This is an archive of the discontinued LLVM Phabricator instance.

[PGO] Use Unique Profile Files when New Processes are Forked
Needs ReviewPublic

Authored by qiongsiwu1 on Jul 14 2023, 6:25 AM.

Details

Summary

Currently, the PGO runtime only creates unique per process profile files when both of the following conditions are true

  1. __llvm_profile_initialize is triggered through global variable initialization (e.g. when exec starts an instrumented binary).
  2. When there are no existing file patterns set. See https://github.com/llvm/llvm-project/blob/bb6d60bf9dfea82814d9e50c5bb457c47d010611/compiler-rt/lib/profile/InstrProfilingFile.c#L795

This patch changes the behaviour so that a new profile file name is generated as soon as a fork system call creates a new process from an instrumented binary. The patch uses the pthread_atfork hook to register the initialization code, and modifies the file name pattern parsing logic so that when %p is present in the pattern, a unique file name can be generated even though the pattern already exists.

Additionally, the unique profile files are created before the profiling starts. If a process is not terminated gracefully through exit, an empty profile file for that process will exist, indicating that the profiling may be incomplete due to the process's abnormal termination.

Note that the unique profile per process is only generated when %p is present in the profile file name pattern. Additionally, this patch does not support Windows because the implementation relies on pthread.

Diff Detail

Event Timeline

qiongsiwu1 created this revision.Jul 14 2023, 6:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2023, 6:25 AM
Herald added subscribers: wlei, Enna1, wenlei. · View Herald Transcript
qiongsiwu1 requested review of this revision.Jul 14 2023, 6:25 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 14 2023, 6:25 AM
Herald added subscribers: Restricted Project, cfe-commits, MaskRay. · View Herald Transcript
qiongsiwu1 edited the summary of this revision. (Show Details)Jul 14 2023, 6:28 AM
qiongsiwu1 edited the summary of this revision. (Show Details)Jul 14 2023, 6:34 AM
qiongsiwu1 edited the summary of this revision. (Show Details)
davidxl added inline comments.Jul 14 2023, 11:56 AM
compiler-rt/lib/profile/InstrProfilingFile.c
795

what is this change for?

compiler-rt/test/profile/Posix/instrprof-fork.c
11

add some dump and check to make sure profiles are not truncated?

Revise the tests to use llvm tools to dump profile data to check for profile validity.

qiongsiwu1 marked 2 inline comments as done.Jul 17 2023, 6:45 AM
qiongsiwu1 added inline comments.
compiler-rt/lib/profile/InstrProfilingFile.c
795

This change is not relevant to the feature, rather a small simplification. If I read/tested the code correctly, we already set FilenamePat on line 787 if it is null, so here we no longer need to worry about that case.

compiler-rt/test/profile/Posix/instrprof-fork.c
11

Thanks for the comment! The test now uses llvm-profdata to show that the profile files are valid.

qiongsiwu1 marked 2 inline comments as done.Jul 17 2023, 6:46 AM

Minor comment spelling fix.

Ping for review. Thanks!!

Updating a test to see if it passes pre-commit CI.

Ping for review. Thanks so much!

Ping for review. We'd like to get this in to LLVM 17 if possible. Please let me know if there are comments/suggestions. Thanks so much!

w2yehia added inline comments.Jul 21 2023, 5:22 AM
clang/lib/Driver/ToolChains/AIX.cpp
440

The change in compiler-rt/lib/profile/InstrProfilingFile.c affects non-AIX platforms too, so won't they need -lpthreads too?

440

In order to avoid adding -lpthreads, we can leave it to the user to decide whether to add or not (i.e. they would add -lpthreads to their application's link command if they want to).
And so to avoid a link error due to undefined symbol, we can define pthread_atfork as a weak empty stub in the compiler-rt.

qiongsiwu1 marked 2 inline comments as done.Jul 21 2023, 7:27 AM
qiongsiwu1 added inline comments.
clang/lib/Driver/ToolChains/AIX.cpp
440

The change in compiler-rt/lib/profile/InstrProfilingFile.c affects non-AIX platforms too, so won't they need -lpthreads too?

This is a good question. On Linux we are ok because pthread_atfork is actually implemented by __register_atfork, which is available in libc.so, so we do not need -lpthreads. I am not sure about other platforms. @davidxl @vsk @ellis may I get your comments?

440

In order to avoid adding -lpthreads, we can leave it to the user to decide whether to add or not (i.e. they would add -lpthreads to their application's link command if they want to).
And so to avoid a link error due to undefined symbol, we can define pthread_atfork as a weak empty stub in the compiler-rt.

@w2yehia and I discussed offline, and we think it is ok to pass in -lpthread on AIX. There should not be any functional issues.

qiongsiwu1 marked 2 inline comments as done.Jul 21 2023, 7:27 AM

Ping for review. Thanks so much!

Ping for review. Thanks so much!

qiongsiwu1 added a comment.EditedAug 1 2023, 7:12 AM

Ping for review. Thanks!

Specifically, note that we need to add -lpthread to the tool chain on AIX to make sure the program can link properly. On Linux we do not need this extra option. What about other platforms, such as macOS?

Ping for review.

If there are no comments on whether we should add -lpthread for platforms other than AIX, I will land this patch soon, and let the official buildbots check for problems. If there are problems, I will let the bots run a bit (probably a day or so) before reverting to catch as many platforms I need to fix as possible, revert, and fix the patch.

Thanks!

Ping for review.

If there are no comments on whether we should add -lpthread for platforms other than AIX, I will land this patch soon, and let the official buildbots check for problems. If there are problems, I will let the bots run a bit (probably a day or so) before reverting to catch as many platforms I need to fix as possible, revert, and fix the patch.

Thanks!

I will take a look as well but please don't land this -lpthread change for non-AIX platforms. I will think about the implication.

Ping for review.

If there are no comments on whether we should add -lpthread for platforms other than AIX, I will land this patch soon, and let the official buildbots check for problems. If there are problems, I will let the bots run a bit (probably a day or so) before reverting to catch as many platforms I need to fix as possible, revert, and fix the patch.

Thanks!

I will take a look as well but please don't land this -lpthread change for non-AIX platforms. I will think about the implication.

Thanks @MaskRay ! I am putting this patch on hold till I hear back. The patch does not alter non-AIX platforms at the moment.

MaskRay added inline comments.Aug 12 2023, 8:32 AM
compiler-rt/lib/profile/InstrProfilingFile.c
94

Revert unneeded change.

253

include stdbool.h and use bool

1027

unneeded change

compiler-rt/test/profile/Posix/instrprof-fork.c
4

I usually use RUN: rm -rf %t && mkdir %t && cd %t and then use something like -o t to place the executable under %t as well.

22

parent and child have the same logic. Use:

if (pid == -1)
  return 1;
printf("%ld.profraw\n", (long)getpid());

Ping for review.

If there are no comments on whether we should add -lpthread for platforms other than AIX, I will land this patch soon, and let the official buildbots check for problems. If there are problems, I will let the bots run a bit (probably a day or so) before reverting to catch as many platforms I need to fix as possible, revert, and fix the patch.

Thanks!

I will take a look as well but please don't land this -lpthread change for non-AIX platforms. I will think about the implication.

Thanks @MaskRay ! I am putting this patch on hold till I hear back. The patch does not alter non-AIX platforms at the moment.

The pthread_atfork undefined reference will cause a problem using glibc<2.34 where pthread_atfork lives in libpthread instead of libc.
I think non-AIX-non-Windows systems can live with an extra -lpthread. For ELF systems (not Apple's among non-AIX-non-Windows systems), -lpthread must be after libclang_rt.profile.a.

compiler-rt/test/profile/Posix/instrprof-fork.c
22

Please ignore my previous comment. I think we want to have a deterministic parent / child output order. Consider adding a wait for the parent process so that we always see the child printf("%ld.profraw\n", (long)getpid()); output before the parent one.

We should inspect more information of the two raw profile files. Create a function only called by the parent and another function only called by the child. Check their counters.

We also need a test if the child process spawns another process.

MaskRay added a comment.EditedAug 12 2023, 10:04 PM

Actually, I am not sure changing %p to create multiple raw profile files should be the default.
Currently, child counters are attributed to the raw profile file derived from the parent process id. Thanks to online profile merging we get just one raw profile file (though counters before fork may be doubly incremented).

Using %p should not magically change the behavior and I am unsure the result is better for PGO.
Can you elaborate separate profile files are going to be useful?
It we decide to provide such a mode, it needs to be opt-in by adding a new format specifier or API.

qiongsiwu1 added a comment.EditedAug 14 2023, 9:49 AM

Using %p should not magically change the behavior and I am unsure the result is better for PGO.
If we decide to provide such a mode, it needs to be opt-in by adding a new format specifier or API.
Can you elaborate separate profile files are going to be useful?

Thanks again for taking a look at this patch!

Sure! We are generating separate profiles through %p for two reasons. First, we think that %p implies a profile per process, even if exec is not called after fork. It is unexpected that with %p, a program that forks child processes only create one profile file from the parent. This is why we did not create a new API or create a new pattern. Second, we are trying to catch non-gracefully terminated processes during profile generate. We observed a scenario where a hot function having all 0 counters. The reasons was the child process was created by a fork (but no exec followed), and then was terminated probably by a signal, instead of going through exit(). Such a process did not have a chance to write back the profile for merging. The parent process took a different call path and never called the hot function. We would like to detect such a situation for a particular process. Hence this patch creates an empty profile file if a process is not terminated gracefully. Using %p is probably the most natural choice when debugging such issues.

One can argue that we may look into the continuous mode to manage abnormal terminations during profile generate. I have not looked too closely at that, but we think it is still useful for a user to tell that some processes did not write any profile back, for debugging, or for diagnostics.

Can we avoid the use of pthread_atfork and the dependency on pthreads? Using pthreads inside the profile runtime implementation is problematic in the case where you want to instrument the libc itself which is used on some platforms and it's one of the reasons why we would like to eliminate the dependency on libc altogether and use sanitizer_common instead, see https://discourse.llvm.org/t/using-c-and-sanitizer-common-in-profile-runtime/58629.

Using %p should not magically change the behavior and I am unsure the result is better for PGO.
If we decide to provide such a mode, it needs to be opt-in by adding a new format specifier or API.
Can you elaborate separate profile files are going to be useful?

Thanks again for taking a look at this patch!

Sure! We are generating separate profiles through %p for two reasons. First, we think that %p implies a profile per process, even if exec is not called after fork. It is unexpected that with %p, a program that forks child processes only create one profile file from the parent. This is why we did not create a new API or create a new pattern. Second, we are trying to catch non-gracefully terminated processes during profile generate. We observed a scenario where a hot function having all 0 counters. The reasons was the child process was created by a fork (but no exec followed), and then was terminated probably by a signal, instead of going through exit(). Such a process did not have a chance to write back the profile for merging. The parent process took a different call path and never called the hot function. We would like to detect such a situation for a particular process. Hence this patch creates an empty profile file if a process is not terminated gracefully. Using %p is probably the most natural choice when debugging such issues.

One can argue that we may look into the continuous mode to manage abnormal terminations during profile generate. I have not looked too closely at that, but we think it is still useful for a user to tell that some processes did not write any profile back, for debugging, or for diagnostics.

Handling cases like these is exactly why the continous mode was created. Your solution only covers a single case, but there's many scenarios where process can terminate abnormally and fail to invoke atexit hooks which would results in profile not being written out.

Thanks for chiming in @phosek !

Can we avoid the use of pthread_atfork and the dependency on pthreads? Using pthreads inside the profile runtime implementation is problematic in the case where you want to instrument the libc itself which is used on some platforms and it's one of the reasons why we would like to eliminate the dependency on libc altogether and use sanitizer_common instead, see https://discourse.llvm.org/t/using-c-and-sanitizer-common-in-profile-runtime/58629.

Yes that is something I will try. The case illustrated in https://discourse.llvm.org/t/using-c-and-sanitizer-common-in-profile-runtime/58629 makes sense. Is there something similar to pthread_atfork in sanitizer_common you would recommend? This functionality needs a hook at fork, and it does not really rely on pthreads.

Using %p should not magically change the behavior and I am unsure the result is better for PGO.
If we decide to provide such a mode, it needs to be opt-in by adding a new format specifier or API.
Can you elaborate separate profile files are going to be useful?

Thanks again for taking a look at this patch!

Sure! We are generating separate profiles through %p for two reasons. First, we think that %p implies a profile per process, even if exec is not called after fork. It is unexpected that with %p, a program that forks child processes only create one profile file from the parent. This is why we did not create a new API or create a new pattern. Second, we are trying to catch non-gracefully terminated processes during profile generate. We observed a scenario where a hot function having all 0 counters. The reasons was the child process was created by a fork (but no exec followed), and then was terminated probably by a signal, instead of going through exit(). Such a process did not have a chance to write back the profile for merging. The parent process took a different call path and never called the hot function. We would like to detect such a situation for a particular process. Hence this patch creates an empty profile file if a process is not terminated gracefully. Using %p is probably the most natural choice when debugging such issues.

One can argue that we may look into the continuous mode to manage abnormal terminations during profile generate. I have not looked too closely at that, but we think it is still useful for a user to tell that some processes did not write any profile back, for debugging, or for diagnostics.

Handling cases like these is exactly why the continous mode was created. Your solution only covers a single case, but there's many scenarios where process can terminate abnormally and fail to invoke atexit hooks which would results in profile not being written out.

Yes I agree there is overlap with continuous mode. That said, there is still the first reason that we need to implement this feature, i.e. no new profile file is created at fork when %p is present in the file name pattern. My understanding is that %p implies a profile file per process created, see the new test case compiler-rt/test/profile/Posix/instrprof-fork.c. My expectation when running the test case is that we should have two different profile files, while currently we only have one profile file created. As @MaskRay pointed out in his comment for this test case, the profile files generated for the two processes can be very different. In the context of fork without exec, the current implementation will only generate one profile file. Effectively the result is identical with the profile generated without %p, if I understand the current implementation correctly. Additionally, continuous mode is not available on all platforms (for example AIX which is our use case). We are working on getting the continuous mode online, but %p as a debugging feature in the context of fork without exec seems to be useful.

With this said, if it is indeed our intention that new profile files should not be created upon fork when %p is present in the file pattern, I am happy to choose alternative routes. I would love to know why we have such an intention before moving on. I understand that this is an existing feature so we may be hesitant to change its behavior, and we would rather not have dependency on pthread, which is an implementation issue. Are there additional reasons that we prefer the current behavior?

Thanks again for the input!