Page MenuHomePhabricator

[CSSPGO][llvm-profgen] Merge and trim profile for cold context to reduce profile size

Authored by wlei on Jan 5 2021, 12:22 PM.



This change allows merging and trimming cold context profile in llvm-profgen to solve profile size bloat problem. Currently when the profile's total sample is below threshold(supported by a switch), it will be considered cold and merged into a base context-less profile, which will at least keep the profile quality as good as the baseline(non-cs).

For example, two input profiles:
[main @ foo @ bar]:60
[main @ bar]:50
Under threshold = 100, the two profiles will be merge into one with the base context, get result:

Added two switches:
--csprof-cold-thres=<value>: Specified the total samples threshold for a context profile to be considered cold, with 100 being the default. Any cold context profiles will be merged into context-less base profile by default.
--csprof-keep-cold: Force profile generation to keep cold context profiles instead of dropping them. By default, any cold context will not be written to output profile.

Though not yet evaluating it with the latest CSSPGO, our internal branch shows neutral on performance but significantly reduce the profile size. Detailed evaluation on llvm-profgen with CSSPGO will come later.

Diff Detail

Event Timeline

wlei created this revision.Jan 5 2021, 12:22 PM
wlei requested review of this revision.Jan 5 2021, 12:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2021, 12:22 PM
wlei edited the summary of this revision. (Show Details)Jan 5 2021, 3:21 PM
wlei added reviewers: wenlei, wmi, davidxl, hoy.
wmi added inline comments.Jan 10 2021, 11:20 PM

I see "[main:2 @ foo:5 @ fa:8 @ fa:7 @ fb:5 @ fb]:9:4" is trimmed but [fa]:14:4 is kept, so the threshold should be 10?


emplace_back(I.getKey(), &I.second)?

wlei updated this revision to Diff 316228.Jan 12 2021, 1:52 PM
wlei edited the summary of this revision. (Show Details)

Rebase and addressing Wei's feedback

wlei added inline comments.Jan 12 2021, 1:53 PM

The second test is for --csprof-keep-cold, which means we only merge the cold profile but don't remove it.
So I intended to use a very high threshold(100) to let all the profiles merge together and keep it.


Good catch, fixed

wmi added inline comments.Jan 14 2021, 12:09 PM

Ok, I see. Thanks for explaining it.

It means if the total count of a profile with context is smaller than the csprof-cold-thres, it will be merged into its base profile. If the total count of the base profile after all merge is done is still smaller than the csprof-cold-thres, it will be trimmed unless csprof-keep-cold flag is specified. Is it correct? Better document it more clearly in the flag desc.

wlei added inline comments.Jan 14 2021, 12:28 PM

Yes, that's exactly what I mean. Will update the flag desc, thanks for your feedback.

wlei updated this revision to Diff 316791.Jan 14 2021, 3:03 PM

Refine the cl::desc for CSProfColdThres and CSProfKeepCold

wmi accepted this revision.Jan 14 2021, 3:31 PM

LGTM. Thanks.

This revision is now accepted and ready to land.Jan 14 2021, 3:31 PM
hoy accepted this revision.Jan 15 2021, 10:00 AM
hoy added inline comments.

Nit: Specify

I was wondering in the future if we'd like a consistent way of computing cold threshold with SampleProfileSummaryBuilder, i.e., counts below 99.9% (aka cut-off) are considered cold, so that the profile generator and the profile consumer will be on the same page.

wlei updated this revision to Diff 317079.Jan 15 2021, 2:35 PM

fix typo and rebase

wlei added inline comments.Jan 15 2021, 2:40 PM

Thanks for sharing this. I'm not sure how we get the 100 threshold, I guess this is the result of many iterations. Let me dig into the SampleProfileSummaryBuilder to see the chance.

hoy added inline comments.Jan 15 2021, 4:29 PM

Yes, it is a number from experiments. The logic in the profile summary builder is a bit different in that it sorts all samples by weight and treats the samples below a given percentage (say 0.01%) as cold. In order words, if there are more than 99.99% samples that are hotter than the current one, then the current is treated as cold. That's the logic used in the compiler.

wenlei added inline comments.Jan 20 2021, 9:29 AM

Yeah, +1 for what Hongtao said, I wanted to use %-based cutoff, but didn't have access to SampleProfileSummaryBuilder from early prototype.

I think create_llvm_profile tool have both sample_threshold_frac and sample_threshold_absolute? Similarly both could be useful here. We could refine them as we broaden the adoption and continue tuning - I don't have an opinion whether we make all these improvements here or in another patch.

wlei added inline comments.Jan 20 2021, 10:40 AM

Cool, thanks for your informative suggestions! will have a try and add more experiments on SPEC.

This revision was landed with ongoing or failed builds.Thu, Feb 4, 11:05 AM
This revision was automatically updated to reflect the committed changes.