This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profdata] Add option to cap profile output size
ClosedPublic

Authored by huangjd on Dec 7 2022, 8:23 PM.

Details

Summary

Allow user to specify --output-size-limit=n to cap the size of generated profile to be strictly under n. Functions with the lowest total sample count are dropped first if necessary. Due to using a heuristic, excessive functions may be dropped to satisfy the size requirement

Diff Detail

Event Timeline

huangjd created this revision.Dec 7 2022, 8:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 7 2022, 8:23 PM
huangjd requested review of this revision.Dec 7 2022, 8:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 7 2022, 8:23 PM

Add a test case for completeness.

llvm/lib/ProfileData/SampleProfWriter.cpp
631

unneeded change

639

unneeded change.

665

The clear call changes the behavior. Why is it needed?

llvm/tools/llvm-profdata/llvm-profdata.cpp
1068

Extract the following to a helper function.

wenlei added inline comments.Dec 8 2022, 11:56 AM
llvm/tools/llvm-profdata/llvm-profdata.cpp
1080–1082

This heuristic can be quite inaccurate. The number of body sample + call site sample entries can be much better proxy for actual profile size and yet still easily accessible.

functionSamples.getBodySamples().size() + functionSamples.getCallsiteSamples().size()

huangjd added inline comments.Dec 14 2022, 4:47 PM
llvm/lib/ProfileData/SampleProfWriter.cpp
665

Name table should only contains names exist in the current ProfileMap. The original implementation adds to the name table when writing a new profile, and the old names are never cleared, which is actually a bug. (However SampleProfileWriter is single use, an instance never calls write twice, so this bug was not showing up until this new feature)

llvm/tools/llvm-profdata/llvm-profdata.cpp
1080–1082

The new revision performs the iterations on a string buffer so the performance of the heuristic is not a big concern (and now it should not overshoot by reducing too many functions). Although using a simple proportional heuristic is still too slow because it ends up reducing one function at the tail at a time

huangjd updated this revision to Diff 483034.Dec 14 2022, 4:48 PM

Refactored code structure
Use a string buffer to rewrite files
Added API for potential new strategy to reduce profile size

huangjd added inline comments.Dec 14 2022, 4:51 PM
llvm/tools/llvm-profdata/llvm-profdata.cpp
1080–1082

Better heuristics can be added in a later patch. Need more real world profile data (industrial use) to confirm which model is the best

huangjd updated this revision to Diff 483275.Dec 15 2022, 11:45 AM

Added output size check

xur added a comment.Dec 15 2022, 2:05 PM

I have some high level questions:
(1) have you considered removing samples with smaller values across the program-- it's like downsampling. Comparing to removing functions with smaller total counts, I think that results in a more consistent profile.
(2) if we choose to remove function, and you sort the function with total count, should we find the exactly place to cut to satisfy the size limit? In theory it should as the profile organized in unit of function. You can keep writing to the buffer until it reaches the limits. Of cause there are some section data for extbinary and summary, but they should be able to compute. Using heuristic to guess the function to remove and doing it iteratively does not seem to be appealing here.

The biggest challenge to compute the number of functions accurately is the compression in extbinary, because the compressed size is non-linear to the original size. Since profile samples and function names are written to different sections (and in CS profile the names are split into two sections and samples can also be split into two sections), there is no way to predict ahead the offset between them. Based on use cases, the current heuristic is under estimating how many functions to prune (and the last iteration typically converges to pruning 1 function) so it's unlikely to remove too many functions. (Note: Also tried using cubic equation for heuristic but that will remove too many functions, so the optimal heuristic is between O(n^2) and O(n^3))

As for down sampling, having a sample count of 0 vs not having a sample means differently to the compiler, so that may change the branch basic block placement on hot functions, not sure if good idea.

The biggest challenge to compute the number of functions accurately is the compression in extbinary, because the compressed size is non-linear to the original size. Since profile samples and function names are written to different sections (and in CS profile the names are split into two sections and samples can also be split into two sections), there is no way to predict ahead the offset between them. Based on use cases, the current heuristic is under estimating how many functions to prune (and the last iteration typically converges to pruning 1 function) so it's unlikely to remove too many functions. (Note: Also tried using cubic equation for heuristic but that will remove too many functions, so the optimal heuristic is between O(n^2) and O(n^3))

I think it makes sense to start with straight-forward implementation and look for opportunities to refine it further in the future.

llvm/include/llvm/ProfileData/SampleProfWriter.h
133

It looks like derived classes must always call this function. Can you add a comment here for the future? Maybe something like // This function must always be called by the overridden implementation?

llvm/lib/ProfileData/SampleProfWriter.cpp
58
llvm/test/tools/llvm-profdata/output-size-limit.test
6

I don't think we should use *-DAG here since that means the test will pass if the lines are reordered. However, if we reorder the symbol line with its contents the text format will be incorrect.

I think the -DAG directive is useful if there is non-determinism in the text format where profile information for a whole symbol may appear before another symbol. In this case we should separate out the CHECKs like the example in [1]. Though we should fix non-deterministic output if this is the case.

[1] https://llvm.org/docs/CommandGuide/FileCheck.html#the-check-dag-directive

llvm/tools/llvm-profdata/llvm-profdata.cpp
981

How about moving this to SampleProfileWriter so that we can use this API in other tooling where we don't invoke llvm-profdata?

1028

Perhaps wrap this in DEBUG(), I'm not sure whether it's useful to have this message all the time.

1077

An if statement with initialization would make the intent clearer here:

if (EC = File.error(); EC) {
}
1211

nit: Just heuristic instead of "heuristic algorithm"?

huangjd updated this revision to Diff 485555.Dec 28 2022, 4:35 PM

Updating D139603: [llvm-profdata] Add option to cap profile output size

huangjd marked an inline comment as done.Dec 28 2022, 4:37 PM
huangjd added inline comments.
llvm/include/llvm/ProfileData/SampleProfWriter.h
133

It is not called by derived classes. It is called by llvm-profdata if the writer needs to be reused

snehasish added inline comments.Dec 28 2022, 5:24 PM
llvm/include/llvm/ProfileData/SampleProfWriter.h
133

My comment was to add some documentation for future implementations which derive from SampleProfileWriter. For example, in this patch

SampleProfileWriterText::reset calls SampleProfileWriter::reset(OS) on L110.
SampleProfileWriterBinary::reset calls SampleProfileWriter::reset(OS) on L142.

I hope that clarifies the suggestion.

huangjd updated this revision to Diff 485653.Dec 29 2022, 5:33 PM
huangjd marked an inline comment as done.

Add comment clarifying reset() usage

huangjd marked 8 inline comments as done.Dec 29 2022, 5:35 PM
snehasish added inline comments.Jan 3 2023, 11:34 AM
llvm/tools/llvm-profdata/llvm-profdata.cpp
981

Re-opening since I think we could move the RewriteProfileSizeLimit and CalculateNumFunctionsToRemove method to SampleProfileWriter as well so that this heuristic (and subsequent updates to it) can be reused in internal tooling directly.

huangjd updated this revision to Diff 486443.Jan 4 2023, 6:39 PM

Refactor: moved implementation to llvm lib so that it can be used by other tools

snehasish accepted this revision.Jan 5 2023, 11:03 AM

lgtm

Please wait a bit to see if others have additional comments. Thanks!

llvm/include/llvm/ProfileData/SampleProfWriter.h
45

Perhaps specify that this is in bytes either in the variable name or a comment?

llvm/tools/llvm-profdata/llvm-profdata.cpp
1209

I think this is generally useful and we should make it visible.

This revision is now accepted and ready to land.Jan 5 2023, 11:03 AM
huangjd updated this revision to Diff 487550.Jan 9 2023, 1:57 PM
huangjd marked an inline comment as done.

Clarified comments

This revision was landed with ongoing or failed builds.Jan 9 2023, 2:01 PM
This revision was automatically updated to reflect the committed changes.
huangjd marked an inline comment as done.
thakis added a subscriber: thakis.Jan 9 2023, 5:30 PM

This breaks tests on Mac: http://45.33.8.238/macm1/52334/step_11.txt

Please take a look and revert for now if it takes a while to fix.

dyung added a subscriber: dyung.Jan 9 2023, 5:31 PM

@huangjd the test you added seems to be failing on Windows. Can you take a look and revert if you need time to investigate?

https://lab.llvm.org/buildbot/#/builders/216/builds/15534

chapuni added a subscriber: chapuni.Jan 9 2023, 7:54 PM
chapuni added inline comments.
llvm/lib/ProfileData/SampleProfWriter.cpp
125

FYI, OriginalFunctionCount was unused but fixed in rG9f4a9d3f4450

127

IterationCount is used only here.

dyung added inline comments.Jan 9 2023, 8:00 PM
llvm/test/tools/llvm-profdata/output-size-limit.test
62

On Windows this seems to be expanded in a way you probably did not expect:
(https://lab.llvm.org/buildbot/#/builders/216/builds/15553/steps/7/logs/FAIL__LLVM__output-size-limit_test)
"$(stat" "-c" "%s" "Z:\test\build\test\tools\llvm-profdata\Output\output-size-limit.test.tmp.output)"

@huangjd the test you added seems to be failing on Windows. Can you take a look and revert if you need time to investigate?

https://lab.llvm.org/buildbot/#/builders/216/builds/15534

How to do size check in a cross platform way?

llvm/include/llvm/ProfileData/SampleProfWriter.h
45

Clarified comment in constructor

llvm/tools/llvm-profdata/llvm-profdata.cpp
981

CalculateNumFunctionToRemove is moved inside FunctionPruningStrategy since it won't be used anywhere else. It can be overriden if necessary