Allow user to specify --output-size-limit=n to cap the size of generated profile to be strictly under n. Functions with the lowest total sample count are dropped first if necessary. Due to using a heuristic, excessive functions may be dropped to satisfy the size requirement
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/tools/llvm-profdata/llvm-profdata.cpp | ||
---|---|---|
1087–1089 | This heuristic can be quite inaccurate. The number of body sample + call site sample entries can be much better proxy for actual profile size and yet still easily accessible. functionSamples.getBodySamples().size() + functionSamples.getCallsiteSamples().size() |
llvm/lib/ProfileData/SampleProfWriter.cpp | ||
---|---|---|
665 | Name table should only contains names exist in the current ProfileMap. The original implementation adds to the name table when writing a new profile, and the old names are never cleared, which is actually a bug. (However SampleProfileWriter is single use, an instance never calls write twice, so this bug was not showing up until this new feature) | |
llvm/tools/llvm-profdata/llvm-profdata.cpp | ||
1087–1089 | The new revision performs the iterations on a string buffer so the performance of the heuristic is not a big concern (and now it should not overshoot by reducing too many functions). Although using a simple proportional heuristic is still too slow because it ends up reducing one function at the tail at a time |
Refactored code structure
Use a string buffer to rewrite files
Added API for potential new strategy to reduce profile size
llvm/tools/llvm-profdata/llvm-profdata.cpp | ||
---|---|---|
1087–1089 | Better heuristics can be added in a later patch. Need more real world profile data (industrial use) to confirm which model is the best |
I have some high level questions:
(1) have you considered removing samples with smaller values across the program-- it's like downsampling. Comparing to removing functions with smaller total counts, I think that results in a more consistent profile.
(2) if we choose to remove function, and you sort the function with total count, should we find the exactly place to cut to satisfy the size limit? In theory it should as the profile organized in unit of function. You can keep writing to the buffer until it reaches the limits. Of cause there are some section data for extbinary and summary, but they should be able to compute. Using heuristic to guess the function to remove and doing it iteratively does not seem to be appealing here.
The biggest challenge to compute the number of functions accurately is the compression in extbinary, because the compressed size is non-linear to the original size. Since profile samples and function names are written to different sections (and in CS profile the names are split into two sections and samples can also be split into two sections), there is no way to predict ahead the offset between them. Based on use cases, the current heuristic is under estimating how many functions to prune (and the last iteration typically converges to pruning 1 function) so it's unlikely to remove too many functions. (Note: Also tried using cubic equation for heuristic but that will remove too many functions, so the optimal heuristic is between O(n^2) and O(n^3))
As for down sampling, having a sample count of 0 vs not having a sample means differently to the compiler, so that may change the branch basic block placement on hot functions, not sure if good idea.
I think it makes sense to start with straight-forward implementation and look for opportunities to refine it further in the future.
llvm/include/llvm/ProfileData/SampleProfWriter.h | ||
---|---|---|
135 | It looks like derived classes must always call this function. Can you add a comment here for the future? Maybe something like // This function must always be called by the overridden implementation? | |
llvm/lib/ProfileData/SampleProfWriter.cpp | ||
58 | I think this should be ValueType based on the guidance here: | |
llvm/test/tools/llvm-profdata/output-size-limit.test | ||
6 | I don't think we should use *-DAG here since that means the test will pass if the lines are reordered. However, if we reorder the symbol line with its contents the text format will be incorrect. I think the -DAG directive is useful if there is non-determinism in the text format where profile information for a whole symbol may appear before another symbol. In this case we should separate out the CHECKs like the example in [1]. Though we should fix non-deterministic output if this is the case. [1] https://llvm.org/docs/CommandGuide/FileCheck.html#the-check-dag-directive | |
llvm/tools/llvm-profdata/llvm-profdata.cpp | ||
988 | How about moving this to SampleProfileWriter so that we can use this API in other tooling where we don't invoke llvm-profdata? | |
1035 | Perhaps wrap this in DEBUG(), I'm not sure whether it's useful to have this message all the time. | |
1084 | An if statement with initialization would make the intent clearer here: if (EC = File.error(); EC) { } | |
1218 | nit: Just heuristic instead of "heuristic algorithm"? |
llvm/include/llvm/ProfileData/SampleProfWriter.h | ||
---|---|---|
135 | It is not called by derived classes. It is called by llvm-profdata if the writer needs to be reused |
llvm/include/llvm/ProfileData/SampleProfWriter.h | ||
---|---|---|
135 | My comment was to add some documentation for future implementations which derive from SampleProfileWriter. For example, in this patch SampleProfileWriterText::reset calls SampleProfileWriter::reset(OS) on L110. I hope that clarifies the suggestion. |
llvm/tools/llvm-profdata/llvm-profdata.cpp | ||
---|---|---|
988 | Re-opening since I think we could move the RewriteProfileSizeLimit and CalculateNumFunctionsToRemove method to SampleProfileWriter as well so that this heuristic (and subsequent updates to it) can be reused in internal tooling directly. |
lgtm
Please wait a bit to see if others have additional comments. Thanks!
llvm/include/llvm/ProfileData/SampleProfWriter.h | ||
---|---|---|
45 | Perhaps specify that this is in bytes either in the variable name or a comment? | |
llvm/tools/llvm-profdata/llvm-profdata.cpp | ||
1216 | I think this is generally useful and we should make it visible. |
This breaks tests on Mac: http://45.33.8.238/macm1/52334/step_11.txt
Please take a look and revert for now if it takes a while to fix.
@huangjd the test you added seems to be failing on Windows. Can you take a look and revert if you need time to investigate?
llvm/lib/ProfileData/SampleProfWriter.cpp | ||
---|---|---|
125 | FYI, OriginalFunctionCount was unused but fixed in rG9f4a9d3f4450 | |
127 | IterationCount is used only here. |
llvm/test/tools/llvm-profdata/output-size-limit.test | ||
---|---|---|
61 | On Windows this seems to be expanded in a way you probably did not expect: |
How to do size check in a cross platform way?
llvm/include/llvm/ProfileData/SampleProfWriter.h | ||
---|---|---|
45 | Clarified comment in constructor | |
llvm/tools/llvm-profdata/llvm-profdata.cpp | ||
988 | CalculateNumFunctionToRemove is moved inside FunctionPruningStrategy since it won't be used anywhere else. It can be overriden if necessary |
Perhaps specify that this is in bytes either in the variable name or a comment?