This is an archive of the discontinued LLVM Phabricator instance.

Inliner Enhancement - Delayed Inlining
Needs RevisionPublic

Authored by yinma on Mar 19 2015, 4:29 PM.

Details

Summary

This strategy can be enabled with -mllvm -enable-delayed-inline.It is disabled by default.

In order to solve the extern A, B, C problem we implemented a subset of thedeferred inlining concept, named delayed inlining.The algorithm is below.

Given A->B->C as a simple example:

  1. Staring at processing SCC_B, if C can be inlined into B, B will be markedwith delayed flag only and return without doing actual inlining
  2. In SCC_A, if delayed B can be inlined into A, B will be inlined into Aand C will be inlined into A like in the normal inlining process.But B will be checked, if B has delayed flag, inling C->B will be pushed intothe queue to make sure C is correctly inlined into B.
  3. If delayed B cannot be inlined into A, C will be inlined into B,the updated B will be tested again as the final decision.

The general algorithm description:

  1. If all call sites in a SCC can be inlined, we will check if any functionin SCC will be visited again in the later inlining steps. If yes, all functionsin SCC will be marked delayed and return without doing actual inlining work.
  2. In later SCC processing, if B can be inlined into A, if B is adelayed function, we recursively make sure all callsites inside B will be pushedinto the queue and correctly inlined into B and B is correctly inlined into A.
  3. If a delayed B cannot be inlined into A, we recursively inline all callsitesinsided B into B. The updated B will be tested again as the final decision.
  4. In order to improve speed, we cache the inline cost computed for the bodyof a function F. for a call instruction to F,the cost currently is set to the min( the cached cost, callpenalty(25)).

The current inliner works on callsite level and defers inlining when the callercan be inlined into all caller's callers.This new algorithm works on function level and delays inlining when all calleescan be inlined into callers. This makes sure all delayed funcs can correctly bere-inlined later. Setting flags into function attribute assist the re-inliningprocessing. The new solution also has makeup steps.

This strategy has been only tested with SPEC workload on AArch64 for performance but correctness passed many other benchmarks.

Diff Detail

Event Timeline

yinma updated this revision to Diff 22322.Mar 19 2015, 4:29 PM
yinma retitled this revision from to Inliner Enhancement - Delayed Inlining.
yinma updated this object.
yinma edited the test plan for this revision. (Show Details)
chandlerc requested changes to this revision.Apr 14 2015, 6:28 AM
chandlerc edited edge metadata.

I don't think using attributes is the right approach here. I think this should just be a change to the core inlining algorithm, and how it processes candidates.

There are also no tests or any experimental results?

Also, please upload patches with full context.

This revision now requires changes to proceed.Apr 14 2015, 6:28 AM
yinma added a comment.May 18 2015, 3:12 PM

Hi Chandler, I agree that the attributes to save cost is not the right approach. I can work on this to fix it. However, we tried this algorithm for SPEC on ARM/AArch64. we saw improved results. However, when we tried this with LTO, we experienced mess regression. Largely due to too aggressive inlining because this algorithm will make some functions which cannot be inlined before inlined now. Some functions should be huge.

In general, delayed inlining is a way that boosts up inlining threshold for certain type of functions. And it is a way that always boosts up threshold. Sometimes, the boost up may be too big. Do you have any idea that we can improve to limit the boost up in the delayed process?

apazos resigned from this revision.Sep 18 2018, 12:18 PM