Early-inlining of recursive call makes the code size bloat exponentially. We should not disable it.
Details
Diff Detail
- Build Status
Buildable 7086 Build 7086: arc lint + arc unit
Event Timeline
It doesn't have to blow it up exponentially. Can we clone the function to be inlined, re-writing the recursive calls to call the clone, and then inline that? Similar to a worker-wrapper transformation
You mean clone the caller before any inlining, and inline that copy instead of the caller itself in early inlining? Yes we could do that, but that adds unnecessary copy overhead for every caller (which could be large). As llvm does not do recursive inlining anyway, this only solves cases where profile is collected from gcc binary, which should be temporary. So I guess it's not worth the effort to special-handle recursive early inlining.
For indirect recursive-call, yes. But those calls will not be by llvm inlined anyway. So promote to direct call does not help much.
I meant to do it only if the inline is recursive. The overhead should be much lower in that case.
Clone the function, make the recursive calls call the clone, inline the clone. Repeat until you no longer want to inline any more, then take the remaining calls to the clone, and change them back. GC the clone. You could think of it as a kind of loop unrolling.
lib/Transforms/IPO/SampleProfile.cpp | ||
---|---|---|
698 | Add a comment? |
yes, we need to make sure it does not cause build breakage.
This is still not-optimal. e.g.
foo() {
if (a) bar_large(); foo();
}
Suppose we first process bar_large() and inlined it. Then we start process foo and found it's recursive call, then we clone the foo body (with bar_large inlined). Then we recursively inline foo_clone into foo. But foo_clone already have bar_large(), which may not necessarily be cloned multiple times. A better solution would be
for (callsite cs : caller.callsites())
if (cs is recursive) clone caller
for (callsite cs : caller.callsites())
if (should early inline) if (cs is recursive) inline caller_clone into caller else inline callee into caller
But this is still not enough, as we may have foo calls bar, bar calls foo(). Which cannot be detected by the above algorithm. Of cause, we can process the SCC to find if caller could be early-inlined to be recursive, but that may be expensive, or make the logic complex.
Add a comment?