This is an archive of the discontinued LLVM Phabricator instance.

[WIP] [ThinLTO] Importing function by function specialization
AbandonedPublic

Authored by ChuanqiXu on Jul 14 2021, 12:26 AM.

Details

Summary

This diff is the successive for D105524, which extract the analysis part of function specialization as an independent analysis.

The intention of this diff is to enable ThinLTO to import functions by heuristics for function specialization.
Now the ThinLTO imports functions from other CU by the heuristics for inlining only.
Simply, ThinLTO would import functions whose lines of codes are less than 100 lines by default to enable potential inlining.
This makes sense due to the inlining is the main root cause for IPO in fact.

And function specialization is another important IPO technique which got implemented recently.
My goal was to make ThinLTO to import functions to enable function specialization.
The key point here is that we need to record the information needed for the imported to judge whether it is beneficial to import a function whose LoC is more than 100 lines.

Here is the extra information I planned to add:

  • Specialize Function Cost. An unsigned number, which is an estimation for the cost to clone one function.
  • Base Bonus for specializing specific argument. A map from ArgNo to the corresponding base bonus. Here the base bonus means the bonus we could calculate by visiting the function body itself only (in other words, doesn't need to see the call site).
  • ArgUsage. A map from ArgNo to the extra bonus for each call site. The extra bonus means the bonus we could calculate for specific call site. For example, if we passed a function in the call, we could calculate the possibility that the function could get inlined at the specific call site only.

Before in https://lists.llvm.org/pipermail/llvm-dev/2021-May/150443.html, I raised some problems we may met:

  • We can't see the funciton body before we import it.
  • It would repeat traversing the call graph in each translation unit, which is very redundant.
  • It may specialize functions with the same version, which could make the code size get larger and redundant.

Here the first problem would be solved by D105524 and this patch.
Then the second problem seems not so noisy. Except 502.gcc_r, which CT time increased 30%, we didn't observe any significant CT change in SPEC2017 int.
Finally, the third problem. The average enlarged code size in spec2017 int is 6.7%. And the maximum is 19%.

The numbers may not be so satisfying. And the function specialization pass is not so mature. It's the reason that I marked this patch as [WIP].

The reason why I update the patch is that I want to make sure that I am on the right direction.

Finally, there are some TODOs in this patch:

  • Didn't implement writer and parser for .ll files.
  • Need to add and fix tests.

Test Plan: SPEC2017 int rate.

Diff Detail