We originally set the hotness threshold as 99.9% to be consistent with gcc FDO. But because the inline heuristic is different between 2 compilers: llvm uses bottom-up algorithm while gcc uses priority based. The LLVM algorithm tends to inline too much early that prevents hot callsites from further inlined into its caller. Due to this restriction, we think it is reasonable to lower the hotness threshold to give priority to those that are really hot. Our experiments show that this change would improve performance on large applications. Note that the inline heuristic has great room for further tuning. Once the inline heuristics are refined, we could adjust this threshold to allow inlining for less hot callsites.
Details
Details
Diff Detail
Diff Detail
- Build Status
Buildable 9031 Build 9031: arc lint + arc unit
Event Timeline
Comment Actions
Bottom up ininling can also create lots of cold inline instances. Other than the effect of blocking hotter callers from being inlined, current Machine Block Layout also has problems forming long hot traces leaving holes in code layout.
I wonder if another way to fix this is better: 1) compute 99.9% working set size 2) if it is too large compared with the working set threshold, drop the hot cutoff.