This patch extends the inliner to give a stronger preference to inline functions which allow trip counts of inner loops to become small constants in the inlined code.
For example:
extern int foo(int m, int n, float b[n], float a[m][n]) { for (i = 0; I < n; i++) for (j = 0; j < m; j++) a[i][j] = a[i][j] + b[i] } extern int main() { float b[4], a[3,4]; … foo(3, 4, b, a); … }
Inlining foo will expose the inner loop bound of 3. Such loops can be good candidates for complete unrolling.
The function convolve() in the geekbench benchmark sharpen_filter is another such example.
The max trip count value chosen is defined in include/llvm/Analysis/InlineCost.h as
const int SmallTripCountMax = 10;
to match the default value of UnrollMaxIterationsCountToAnalyze in lib/Transforms/Scalar/LoopUnrollPass.cpp.
The patch makes use of DominatorTree, LoopInfo, and ScalarExpansion to find loops and potential constant loop trip counts in the potentially inlined functions. An instance of InliningLoopInfoCache is used to cache results for functions and so that compile time is not spent recomputing the DominatorTree, LoopInfo, and ScalarExpansion info for functions into which no inlining has occurred. Our experiments show that this cache is an effective and essential mechanism to keep the inlining compile time low.
Patch by Robert Cox