This is a fix for GH60773 [0], an "inliner inlines exponentially" problem similar to [1]. The inliner isn't my area of expertise so be forewarned that I might be using terminology wrong below,
In GH44598 [2] a pattern was discovered where the inliner will inline large stretches of an SCC before it realises it's inlining a whole cycle, growing function sizes significantly and increasing compile times. The fix in [3] was to annotate intra-SCC calls with a inline-cost multiplier if they're part of such behaviour, to discourage further inlining.
We've found a scenario [0] where this same behaviour occurs, but where we start inlining an outer RefSCC if a function call is devirtualised. Observe in the attached test case: longname forms one RefSCC, while the chain from rec_call_to_longname to e forms an outer RefSCC:
RefSCC with 1 call SCCs: SCC with 1 functions: longname RefSCC with 4 call SCCs: SCC with 1 functions: rec_call_to_longname SCC with 1 functions: a SCC with 1 functions: c SCC with 1 functions: e
During inlining of functions a, c, e, both rec_call_to_longname and longname are always inlined, the latter because we give a large bonus to any function with calls that can be devirtualised. And because there are two calls in longname that get devirtualised, the number of function calls to rec_call_to_longname doubles each time we inline. This makes function e grow exponentially in the size of the outer RefSCC. This showed up in an LTO link of a large C++ codebase, adding at least an order of magnitude to compile times (it didn't complete). I think the frequency of this behaviour occurring is rare, it may depend on PGO too, but the impact is large when it does occur.
As a fix, in this patch I've extended the original fix from [3] to detect newly devirtualised calls and add the "don't repeatedly inline this" multiplier attribute. The principle is that managing to devirtualise a function call is already a great triumph; but starting to inline from a parent RefSCC is very likely to form a loop. The exact check ("any devirtualised call to outside of SCC we were already inlining") might be too strong -- it could be relaxed or examine the RefSCC structure, I'm not extremely confident on this so have defaulted to being conservative.
The compile time impact is good [4]: there's some noise and also some improvements in compile time. I ran a SPEC2017 intspeed "estimate" with this change and saw some unremarkable differences that were all within the noise. Performance measurement is another area I'm unfamiliar with so I don't want to make any claims, but I don't believe this patch causes a runtime performance regression.
Note the branch-on-undef with profile metadata in the test case: that's what came out of reducing the original long-running sample, seemingly it's part of convincing the inliner to repeatedly inline. I understand we're supposed to be avoiding adding more tests that branch on undef, so I'll try and cook up an alternative, but the rest of the patch is ready for review.
[0] https://github.com/llvm/llvm-project/issues/60773
[1] https://github.com/llvm/llvm-project/issues/44598
[2] https://github.com/llvm/llvm-project/issues/44598#issuecomment-981025691
[3] https://reviews.llvm.org/D121084
[4] http://llvm-compile-time-tracker.com/compare.php?from=2399497c9d68c946764e3089fb0b4c8d29166d67&to=160528d0c9ac9b6f179818664030110faf13ef0b&stat=instructions%3Au
IIUC, there's never actually a cycle (ref or call) in the IR. The problem is that after inlining, we do see the same sort of issue as with inlining through a child SCC because the indirect calls end up ping ponging between @longname and @rec_call_to_longname.