This is an archive of the discontinued LLVM Phabricator instance.

[PGO] Suspend SIGKILL for PR_SET_PDEATHSIG in profile-write
ClosedPublic

Authored by xur on Feb 14 2017, 11:39 AM.

Details

Summary

We found a nondeterministic behavior when doing online profile merging
for multi-process applications. The application forks a sub-process and
sub-process sets to get SIGKILL when the parent process exits,

The first process gets the lock, and dumps the profile. The second one
will mmap the file, do the merge and write out the file. Note that before
the merged write, we truncate the profile.

Depending on the timing, the child process might be terminated
abnormally when the parent exits first. If this happens:
(1) before the truncation, we will get the profile for the main process
(2) after the truncation, and before write-out the profile, we will get
0 size profile.
(3) after the merged write, we get merged profile.

This patch temporarily suspend the SIGKILL for PR_SET_PDEATHSIG
before profile-write and restore it after the write.

This patch only applies to Linux system.

Event Timeline

xur created this revision.Feb 14 2017, 11:39 AM
davidxl added inline comments.Feb 14 2017, 12:18 PM
lib/profile/InstrProfilingFile.c
560

Put the platform dependent utility into InstProfilingUtil.c file.

void lprofSuspendSigKill(int *pdeathsig) {
#ifdef __linux__
...
#else 
#endif
}

void lprofRestoreSigKill(int pdeathsig) {
..
}
xur updated this revision to Diff 88418.Feb 14 2017, 12:55 PM

move platform depended code to InstrProfilingUnil.c per David's suggestion.

davidxl edited edge metadata.Feb 14 2017, 1:05 PM

Can you create a test case? It can be put under the Linux subdir.

xur updated this revision to Diff 88595.Feb 15 2017, 1:04 PM

Add a testcase

sing a fake count section to increase the profile size.
Main process gets to profile run-time first and dump the profile.
It will then exit when child process writes the profile.
Without this patch, there is a high chance of getting 0 size profile.

davidxl added inline comments.Feb 15 2017, 1:22 PM
test/profile/Linux/prctl.c
17 ↗(On Diff #88595)

format the test in llvm style.

23 ↗(On Diff #88595)

50 us or microsec

33 ↗(On Diff #88595)

10us

40 ↗(On Diff #88595)

Just check size of profile data? It is not guaranteed that child process will have a chance to dump.

xur updated this revision to Diff 88616.Feb 15 2017, 3:29 PM

Integrated David's review comments.

An addition change is the adding of 'sleep 5' b/w run and llvm-profdata
because if we run llvm-profdata immediately after. It's possible to get a
0-size files as it's being truncated by the child process (ie. the main
process exit while the child process is still accessing the file).

davidxl accepted this revision.Feb 15 2017, 11:31 PM

lgtm

test/profile/Linux/prctl.c
3 ↗(On Diff #88616)

sleep 5 seconds can be a problem -- it greatly increases the total runtime of check-profile. Please tune this down to 1 second (by reducing the fake data size).

This revision is now accepted and ready to land.Feb 15 2017, 11:31 PM
xur closed this revision.Feb 16 2017, 11:33 AM