This is an archive of the discontinued LLVM Phabricator instance.

[profile] Fix value profile runtime merging issues
ClosedPublic

Authored by xur on Mar 23 2018, 2:48 PM.

Details

Summary

This patch fixed two issues:
(1) value profile merging did not worked as the strong definition of the merge hook function never worked. We were not doing the merging of value profile. This patch remove the weak attribute and assign the hook dynamically.
(2) in case value profile is not merged, truncated the file so that we don't have garbage in the end of the profile.

Diff Detail

Event Timeline

xur created this revision.Mar 23 2018, 2:48 PM
davidxl added inline comments.Mar 23 2018, 2:56 PM
lib/profile/InstrProfilingFile.c
241

non value profiling data has fixed size, so why is this needed?

xur added inline comments.Mar 23 2018, 3:10 PM
lib/profile/InstrProfilingFile.c
241

Because there are still value counts in both in-memory and in-disk profiles.

Using the test in the patch as the example (running on my machine).
assuming the binary is named as 'gen' and VPMergeHook null.

if we don't do online merge,
"gen" will generate a profile of size 320 bytes.
"gen 1" will generate a profile with size of 352 bytes

If we run "gen 1 && gen", the result binary is still 352 bytes.
But the effective size is 320 bytes. The tail 32 bytes are trash.

davidxl added inline comments.Mar 23 2018, 3:16 PM
lib/profile/InstrProfilingFile.c
241

Without VpMergeHook, we should assert that there is no value profile data -- it should not be dumped in the first place.

xur added inline comments.Mar 24 2018, 8:55 AM
lib/profile/InstrProfilingFile.c
241

current behavior is to write out the in-memory value profile data and discard the value profile data in the file. if there is not value profile from the file, the result is still correct.

I'm not familiar with the use cast the not having libc (where VPMergeHook will be nullptr). Is it not suppose to have any value profile?

I can assert VPMergeHook==nullptr and in-memory valuesites are not empty. That will be easy. Is this what you suggested?

davidxl added inline comments.Mar 24 2018, 9:39 AM
lib/profile/InstrProfilingFile.c
241

In lprofGetVPDataReader method, returns null if VPMergeHook is null.

xur added inline comments.Mar 26 2018, 11:24 AM
lib/profile/InstrProfilingFile.c
241

This will generate incorrect profile: the valuesite in the data section says there are valuesites but VP count is empty.

if we do this, the behavior would be: llvm-profdata show or merge command will stop silently at the first function that has the value-site instrumentation. The rest profile will be discarded.

Note, of course, this only applies to VPMergeHook==nullptr case.

davidxl added inline comments.Mar 26 2018, 12:18 PM
lib/profile/InstrProfilingFile.c
241

On the other hand, it seems safe to do truncation here unconditionally before the writing later.

davidxl added inline comments.Mar 27 2018, 2:43 PM
lib/profile/InstrProfilingFile.c
241

one correction. When Merge is done, the namedata section will be skipped during dumping to speed up the profile write. Make sure you do the truncation to the right size.

xur updated this revision to Diff 140479.Mar 30 2018, 1:35 PM

This is the updated patch. It fixed some issues in previous patch. It also adds a new API for updating value-count in batch mode. Without this, PGO instrumented Clang incurs 20x slow down in building itself (with %9m prefix -- with %m is even greater slowdown ).

davidxl added inline comments.Mar 30 2018, 1:42 PM
lib/profile/InstrProfilingValue.c
136

Is this call getting inlined in the library build? If not, it will introduce unnecessary overhead.

One way to avoid this is to define a static always inline function, and then make the two APIs as wrappers to that common function.

I checked the assembly in
clang_rt.profile-i386.dir/ and
clang_rt.profile-x86_64.dir/

Both are inlined.

xur added a comment.Mar 30 2018, 1:55 PM

My previous refers to using gcc as the compiler.

With llvm (itself, i.e. trunk) as the compiler, it's not inlined.

I will update the patch using a wrapper.

xur updated this revision to Diff 140485.Mar 30 2018, 2:12 PM

using wrappers to always_inline function to avoid function call overhead.

davidxl added inline comments.Mar 30 2018, 2:17 PM
lib/profile/InstrProfilingValue.c
135–137

need to put the attribute((..)) into InstrProfilingPort.h under GNUC section.

davidxl added inline comments.Mar 30 2018, 2:28 PM
lib/profile/InstrProfilingMerge.c
40

There is an existing interface for this:

__llvm_profile_get_size_for_buffer_internal

lib/profile/InstrProfilingValue.c
157

+= CountValue ?

xur updated this revision to Diff 140495.Mar 30 2018, 2:53 PM

move inline attribute into InstrProfilingPort.h

xur added inline comments.Mar 30 2018, 3:12 PM
lib/profile/InstrProfilingMerge.c
40

using that API needs to pass the begin and end address for each section.
Here we use all the counters recorded in the header.

lib/profile/InstrProfilingValue.c
157

yes. I cannot believe this is not caught by all the tests.

xur updated this revision to Diff 140498.Mar 30 2018, 3:17 PM

integrated David's review comments

davidxl added inline comments.Mar 30 2018, 3:42 PM
lib/profile/InstrProfilingMerge.c
40

Those pointers are available via llvm_profile_begin_xxx /llvm_profile_end_xxx.

It is better to avoid duplicating the logic of computing the profile data size.

xur updated this revision to Diff 140508.Mar 30 2018, 4:46 PM

using __llvm_profile_get_size_for_buffer_internal directly to get the profile size.

xur updated this revision to Diff 140509.Mar 30 2018, 4:56 PM

Fixed type cast warnings.

davidxl added inline comments.Mar 30 2018, 5:48 PM
lib/profile/InstrProfilingFile.c
189

What I suggested is to use public APIs:

llvm_profile_begin_data(), llvm_profile_end_data(), llvm_profile_begin_counters(), llvm_profile_end_counters(), so that you don't even need to to assume the raw profile layout here.

xur marked an inline comment as done.Mar 30 2018, 9:46 PM
xur added inline comments.
lib/profile/InstrProfilingFile.c
189

you are right! as we have checked the compatibility, both versions should have the same format.

xur updated this revision to Diff 140526.Mar 30 2018, 9:47 PM
xur marked an inline comment as done.

integrated David's review comments to use an available api.

This revision is now accepted and ready to land.Mar 30 2018, 10:11 PM
This revision was automatically updated to reflect the committed changes.

Hi,
It appears the commit related to this review has cause a bot failure on
Green Dragon. This failure was masked by an earlier failure which was
reverted a short time ago. Please have a look at:
http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/44030/ and
either revert your commit or notify me if you will be submitting a patch
within the next hour.

Respectfully,
Mike Edwards