This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
1/15
Inliner.cpp
-
test/Transforms/Inline/
-
Transforms/
-
Inline/
-
mut-rec-scc-2.ll
1/3
mut-rec-scc.ll

Differential D120584

[NewPM][Inliner] Mark inlined calls to functions in same SCC as callee as noinline
AbandonedPublic

Authored by aeubanks on Feb 25 2022, 11:41 AM.

Download Raw Diff

Details

Reviewers

nikic
mtrofin
davidxl
hoy
wenlei
ctetreau
kazu

Summary

When processing the list of calls created by inlining, check each call
to see if the new call's callee is in the same SCC as the original
callee's. If so, mark the call as noinline.

This is an alternative to D98481, which forbids inlining functions in a
non-trivial SCC at all. This allows us to do one level of inlining, plus
inlining of any other calls to functions outside the SCC.

Hopefully fixes PR45253.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,060 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vloxseg.c
	60,070 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vluxseg.c
	60,070 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-overloaded::vloxseg.c
	60,050 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-overloaded::vluxseg.c

Event Timeline

aeubanks created this revision.Feb 25 2022, 11:41 AM

Herald added a reviewer: ctetreau. · View Herald TranscriptFeb 25 2022, 11:41 AM

Herald added subscribers: ormris, hiraditya. · View Herald Transcript

aeubanks requested review of this revision.Feb 25 2022, 11:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 25 2022, 11:41 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

aeubanks mentioned this in D98481: [Inliner] Do not inline mutual-recursive functions to avoid exponential size growth..Feb 25 2022, 11:42 AM

will fix the two failing tests if we like this patch

this does cause fewer instances of inlining on llvm-test-suite

I did some performance testing on some internal benchmarks, seeing mixed results

cuviper added a subscriber: cuviper.Feb 25 2022, 12:09 PM

Harbormaster completed remote builds in B151525: Diff 411483.Feb 25 2022, 12:14 PM

aeubanks edited reviewers, added: nikic, mtrofin, davidxl, hoy, wenlei; removed: ctetreau.Feb 28 2022, 4:55 PM

Herald added a reviewer: ctetreau. · View Herald TranscriptFeb 28 2022, 4:55 PM

nikic added inline comments.Mar 1 2022, 5:44 AM

llvm/lib/Transforms/IPO/Inliner.cpp
942	Shouldn't we also check `&& CG.lookup(CG.get(*NewCallee)) != C` here? We do want to allow this if caller, the callee and the new callee are all in the same SCC, right? If everything is one SCC, then the inlining is size-limited through the usual mechanism.

Testing this with rust, while D98481 only had minor effect on performance (regressions in the sub-1% range that are more than compensated by noinline removal it enables), this one has pretty big impact (filter by "check" on https://perf.rust-lang.org/compare.html?start=4a56cbec59903a830a5fc06c5c81956de4199584&end=a6d1b910a312aa142487362987a3ce22c6da7684) with regressions up to 12%.

This *might* be related to the missing check I pointed out above.

In D120584#3346280, @aeubanks wrote:

will fix the two failing tests if we like this patch

this does cause fewer instances of inlining on llvm-test-suite

I did some performance testing on some internal benchmarks, seeing mixed results

IIUC, we sometimes want to allow even recursive function inlining, up to a point; and same for mutually-recursive ones. For instance, if A<->B, both A and B may be worth inlining in B and A, respectively, for their respective base cases.

Since there are regressions we've observed, sounds like we can isolate 1-2 of those and better understand the problem?

mtrofin added a reviewer: kazu.Mar 1 2022, 7:23 AM

skip noinline if in caller's SCC

going to rerun benchmarks with new revision

Harbormaster completed remote builds in B151995: Diff 412153.Mar 1 2022, 11:09 AM

Updated patch is performance-neutral on our side. Based on some spot checks, it does resolve the catastrophic inlining issue as well. The general approach also looks reasonable.

llvm/lib/Transforms/IPO/Inliner.cpp
942	Add a comment here for why the restriction exists.

This revision is now accepted and ready to land.Mar 2 2022, 12:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 12:43 AM

add comment

still investigating one regressing internal benchmark

Harbormaster completed remote builds in B152234: Diff 412529.Mar 2 2022, 2:06 PM

davidxl added inline comments.Mar 3 2022, 11:09 PM

llvm/lib/Transforms/IPO/Inliner.cpp
951	This feels like very strict. Is it possible to allow some level (under an option) of iterative inlining into the same SCC? This is because not all cases of inlining into a non-trivial SCC will result in compile time issue due to existing threshold based stopping mechanism (i.e. the new callee in SCC is too large).
llvm/test/Transforms/Inline/mut-rec-scc.ll
41	can you add a callsite to a function test3() which is outside the SCC, where the test3's inlining is deferred until test2 is inlined. This is to test that test3's inlining won't be suppressed.

nikic added inline comments.Mar 4 2022, 12:45 AM

llvm/lib/Transforms/IPO/Inliner.cpp
941–949	Just a suggestion, based on my understanding...
951	This does not restrict inlining into the same SCC. What it prevents is continuously peeling off more and more calls from a different non-trivial SCC. From what I can tell, this is a rare case, only two programs in llvm-test-suite show any codegen changes with this patch.

davidxl added inline comments.Mar 4 2022, 10:49 AM

llvm/lib/Transforms/IPO/Inliner.cpp
951	To clarify, my understanding of the scenario is like the following foo --> (a, b, c) where foo calls 'a'. where a, b, and c forms anon trivial SCC. Without the fix, we can end up with inlining of foo->a, foo->b, foo->c, foo->a, foo->b, .... until foo blows up (we don't have a caller limit set in the inliner). With this patch, once foo-> a is inlined, the new callsite foo->b is marked as no inline. Is that the right?

nikic added inline comments.Mar 4 2022, 11:22 AM

llvm/lib/Transforms/IPO/Inliner.cpp
951	Yeah, that sounds about right. I believe local inlining history will actually make sure that we only inline foo->a, foo->b, foo->c, which is what prevents this from simply inlining infinitely. But it still ends up flattening the SCC, which is particularly problematic if there are multi-edges in the SCC, in which case flattening is exponential. D98481 forbids the foo->a inline in the first place, while this patch allows inlining foo->a, but forbids the foo->b inline. There are some cases where this kind of single-level inline is probably useful, e.g. if inlining at this call-site can prune away the recursive parts entirely.

davidxl added inline comments.Mar 4 2022, 1:48 PM

llvm/lib/Transforms/IPO/Inliner.cpp
951	What I am thinking is a light weight mechanism to fix this with more flexibility. Assuming the number of callsites (from outside of SCC) into non-trivial SCC is small, a map from the <caller, non-trivialSCC> pairs to inlined count can be used to track the number of iterative inlining happening between the caller and SCC. If it exceeds the limit, mark the new callsite noInline.

wenlei added inline comments.Mar 5 2022, 11:53 AM

llvm/lib/Transforms/IPO/Inliner.cpp
951	This change is indeed better than D98481, but I'm still not sure if it is good enough from performance perspective. I hear the argument about build speed and the earlier comments on strict bottom-up inlining, but different people have different balance between build speed and performance. Great if we can achieve both; otherwise, trying to minimize perf impact and giving user flexibility is probably better. Recursive inlining is weak for llvm even without this change, and this area is somewhat perf sensitive. We have observed perf gap between gcc and llvm due to less recursive inlining. We also have made changes to allow more recursive inlining with PGO outside of cgscc inliner where there's less constraints for this kind of issues. D109104 is example of more aggressive recursive inlining with PGO leading to visible perf movement even on spec. We can try to measure this change with our internal benchmarks. But overall I agree with David that a solution that gives people flexibility would be better (either an optional inliner size/growth cap, or cross scc inline history counter like David suggested).

nikic added inline comments.Mar 6 2022, 8:23 AM

llvm/lib/Transforms/IPO/Inliner.cpp
951	Would it be sufficient to add an option for disabling the check to get this landed? Does that add the desired flexibility? To be clear, this is really not about a compile-time / performance tradeoff -- the exponential nature of the issue means that builds just don't finish in any reasonable timeframe anymore. It's a correctness problem, and as such should have been fixed months ago, even if there were a performance impact. At this point I cannot justify delaying a fix for this critical bug anymore. If upstream is unwilling to accept a "good enough" fix, then we will make this a required patch for downstream LLVM distributors instead. I had hoped to avoid this, because it diverges upstream LLVM from distro-provided LLVM and causes work for each distributor, but at the same time I also can't have an LLVM 14.0.0 release with this issue unfixed, which is where we're headed now. PS: I am happy to discuss more invasive ways to address this general class of problems after an immediate fix has landed on the release branch. Something I have been toying with is assigning an inlining penalty to inlined call-sites, based on the cost of the callee it was inlined from. This basically allows those call sites to use up remaining inlining budget from the parent function, but not more than that. This is principled, in that it avoids a cost-model bypass if inlining happens in a non-bottom-up fashion, but I suspect that it will fare worse than ad-hoc solutions in practice, because it would also penalize inlining of promoted or devirtualized calls. But this is just a side-note, as this (as well as the other alternatives we have previously discussed) are not suitable as immediate, minimally invasive, low-impact bug fixes.

xbolva00 added a subscriber: xbolva00.Mar 6 2022, 8:32 AM

xbolva00 added inline comments.

llvm/lib/Transforms/IPO/Inliner.cpp
951	Your fix looks good enough; better fix could be added to point release 14.0.1

Adding an option is a way to to forward, but I think it should be off by default. If it is on by default, it is very likely to cause some performance regressions.

As long as the option is there for rustc to turn on, we can agree to disagree...

In D120584#3362467, @davidxl wrote:

Adding an option is a way to to forward, but I think it should be off by default. If it is on by default, it is very likely to cause some performance regressions.

Or.. many bug reports about infinite compile times. Bad trade off.

The number one goal of every compiler should be correct and working compilation process.

I would not object turning it on by default when the more elaborate solution is there.

If we turn the current solution on by default, we run the risk of turning one bug (compile time) into another (performance) -- depends on the users, it is not a 100% good tradeoff to make.

Can somebody provide any perf data to justify “default off”?

LLVM user base is large, the only way to get an answer to that is to turn it on and let user report -- but according to my experience, that is a very disruptive process and user may spend long time figuring out the source of regressions. From the nature of the change, the chances of that happening is quite high.

It is reasonable to expect either 1) a proper/complete fix; or 2) a stop gap solution which user can use to combat the compile time issue. The compile time issue is there since forever and not a recent regression. 2) is one step in the right direction and the final goal is to achieve 1).

There's a couple of reasons why a default-disabled option is not great:

It's not compatible with linker plugin LTO. People would have to be experts on LLVM internals and know that they need to add something like -Wl,-mllvm,-enable-unbounded-cross-scc-inlining=0 to their build system if they are linking rust object files.
This problem is not Rust-specific. According to https://discourse.llvm.org/t/rust-newpm-blocker-catastrophic-inlining/6171/2, Apple uses the earlier version of this patch, though not sure for which toolchain(s). IIRC Mozilla previously encountered this with C++ code.
While I agree that correctness and performance is just another tradeoff, we generally always trade off in favor of correctness, unless we expect a widespread performance impact, in which case may temporarily trade off in favor of performance. There is no evidence that this patch has any widespread impact, but there is evidence against it: For rustc performance tests, this patch is entirely performance neutral (while the previous version had minor negative impact) and the entirety of llvm-test-suite has only two programs even showing codegen changes with this patch. There might be an impact in some isolated cases (actually, this is pretty much guaranteed with any inliner change), but all evidence points towards no widespread impact, and as such I don't think there is justification for favoring performance over correctness.

It is reasonable to expect either 1) a proper/complete fix;

I do think this is a proper fix. There are some alternative ways to fix the issue which may well be better for some cases, but frankly I don't understand how you would even evaluate that without knowing about specific instances that this patch would regress. Knowing specific regressions, we could check whether an alternative patch avoids them, but without that, isn't this just guesswork?

The compile time issue is there since forever and not a recent regression.

This is true, but the new pass manager has exacerbated this issue, which is a recent regression. We did encounter a few instances of the same issue with the legacy pass manager as well, but the issue has become much more wide-spread and pressing with the new pass manager.

All that being said, I would take a default-disabled option over nothing at all. We would enable it by default in rustc, and I would give a recommendation to enable it by default in Fedora/RHEL LLVM, but at least we would not be forcing all distros to adopt the patch.

In D120584#3362634, @nikic wrote:

There's a couple of reasons why a default-disabled option is not great:

It's not compatible with linker plugin LTO. People would have to be experts on LLVM internals and know that they need to add something like -Wl,-mllvm,-enable-unbounded-cross-scc-inlining=0 to their build system if they are linking rust object files.

This problem is not Rust-specific. According to https://discourse.llvm.org/t/rust-newpm-blocker-catastrophic-inlining/6171/2, Apple uses the earlier version of this patch, though not sure for which toolchain(s). IIRC Mozilla previously encountered this with C++ code.

While I agree that correctness and performance is just another tradeoff, we generally always trade off in favor of correctness, unless we expect a widespread performance impact, in which case may temporarily trade off in favor of performance. There is no evidence that this patch has any widespread impact, but there is evidence against it: For rustc performance tests, this patch is entirely performance neutral (while the previous version had minor negative impact) and the entirety of llvm-test-suite has only two programs even showing codegen changes with this patch. There might be an impact in some isolated cases (actually, this is pretty much guaranteed with any inliner change), but all evidence points towards no widespread impact, and as such I don't think there is justification for favoring performance over correctness.

It is reasonable to expect either 1) a proper/complete fix;

I do think this is a proper fix.

There is a reason to believe this is not the proper fix: there are callsites that are perfectly ok to be inlined but will get disabled due to the way the fix is done. It is considered proper if the fix actually detects the compile time budget will be reached -- not at the first inlining.

I guess we need to agree to disagree here.

There are some alternative ways to fix the issue which may well be better for some cases, but frankly I don't understand how you would even evaluate that without knowing about specific instances that this patch would regress. Knowing specific regressions, we could check whether an alternative patch avoids them, but without that, isn't this just guesswork?

The compile time issue is there since forever and not a recent regression.

This is true, but the new pass manager has exacerbated this issue, which is a recent regression. We did encounter a few instances of the same issue with the legacy pass manager as well, but the issue has become much more wide-spread and pressing with the new pass manager.

All that being said, I would take a default-disabled option over nothing at all. We would enable it by default in rustc, and I would give a recommendation to enable it by default in Fedora/RHEL LLVM, but at least we would not be forcing all distros to adopt the patch.

Agree -- let's make this step happen first to unblock the affected users.

aeubanks mentioned this in D121084: [NewPM][Inliner] Make inlined calls to functions in same SCC as callee exponentially expensive.Mar 6 2022, 11:53 PM

aeubanks added inline comments.Mar 7 2022, 12:26 AM

llvm/lib/Transforms/IPO/Inliner.cpp
951	https://reviews.llvm.org/D121084 is a similar alternative to this where instead of marking call sites `noinline`, we make the inline cost of these sorts of calls exponentially more expensive. it's definitely not as good at preventing bloat as this patch based on testing variants of mut-rec-scc.ll, but perhaps it's good enough for real world examples?

xbolva00 added inline comments.Mar 7 2022, 3:09 AM

llvm/lib/Transforms/IPO/Inliner.cpp
953	What if ICB is always_inline?

I think we need to land this patch. It's fixing a major regression from the new pass manager which is blocking some users from even upgrading to LLVM 13. I understand that there are concerns that it may regress performance in some cases, but I think that fixing the regression is more important especially since we have not actually identified any real world cases that would regress. It also sounds like there is at least one downsteam user working around this regression in their own tree, and given the impact of the problem, we would also have to fix this downstream in Fedora as well.

I'm fairly certain @davidxl isn't interested in LLVM release branches, so there's the temporary option of landing this only in the release branch, with the promise of a better solution in a reasonable timeframe on ToT.

If this is doable, it feels like a winning strategy to me.

add cl::opt

actually given that multiple people have complained about this issue, perhaps it's better with the cl::opt to just submit this, pass -mllvm where performance may be impacted, and have impacted people investigate this and come up with a potentially better solution

llvm/test/Transforms/Inline/mut-rec-scc.ll
41	do you mean add a call inside test3 to some other new function (e.g. test4)?

davidxl added inline comments.Mar 7 2022, 10:11 AM

llvm/lib/Transforms/IPO/Inliner.cpp
96	As I stated, I prefer this off by default, or the alternate patch:https://reviews.llvm.org/D121084
llvm/test/Transforms/Inline/mut-rec-scc.ll
41	ok to leave this as it is for now.

Harbormaster completed remote builds in B152966: Diff 413531.Mar 7 2022, 10:23 AM

In D120584#3364487, @aeubanks wrote:

actually given that multiple people have complained about this issue, perhaps it's better with the cl::opt to just submit this, pass -mllvm where performance may be impacted, and have impacted people investigate this and come up with a potentially better solution

Right. I'd like to emphasize again that if impacted cases get reported, this allows us to address them in a data-driven way. For example, we now have two very similar solutions here and in D121084, but we don't have any way to evaluate whether the latter is better in any practical sense. If we had an impacted case, we could check whether or not it would be affected by D121084, with which cost multiplier, etc. Without having an affected case, we can only guess.

In D120584#3364322, @davidxl wrote:

If this is doable, it feels like a winning strategy to me.

Yes, we could commit this patch directly to the release branch, and it seems like we may have to do this to get it in in time for the release, but I don't want that to slow down the effort to fix this in main. I think we still need to come to some consensus on how to fix this in a timely manner in the main branch.

llvm/lib/Transforms/IPO/Inliner.cpp
96	I'm a little confused by all negatives in the option, but I think the default state should be whatever state will fix PR45253. This at least will get us closer to the old-pass-manager behavior and will unblock users. I don't think the option is particularly useful if the default state doesn't fix PR45253.

As I mentioned, the cost of investigating performance regressions is high. However if y'all think it is better to shift the cost, I won't stay in the way.

I've also approved the other patch.

aeubanks mentioned this in rG53e5e586709a: [NewPM][Inliner] Make inlined calls to functions in same SCC as callee….Mar 7 2022, 11:54 PM

I know that we have settled on D121084, but just to provide a data point to support the earlier concern that David and I raised about perf impact.

I measured this on a large internal workload (at Meta), and there's a consistent 0.5% CPU regression with this patch.

This revision now requires changes to proceed.Mar 10 2022, 2:19 PM

aeubanks abandoned this revision.Mar 14 2022, 4:41 PM

davide mentioned this in D138602: [WIP] Alwaysinliner time explosion with new pass manager.Nov 23 2022, 1:14 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

Inliner.cpp

14 lines

test/

Transforms/

Inline/

mut-rec-scc-2.ll

19 lines

mut-rec-scc.ll

68 lines

Diff 412153

llvm/lib/Transforms/IPO/Inliner.cpp

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines

/// lifetime markers. It is now in the process of being removed. To experiment /// lifetime markers. It is now in the process of being removed. To experiment

/// with disabling it and relying fully on lifetime marker based stack /// with disabling it and relying fully on lifetime marker based stack

/// coloring, you can pass this flag to LLVM. /// coloring, you can pass this flag to LLVM.

static cl::opt<bool> static cl::opt<bool>

DisableInlinedAllocaMerging("disable-inlined-alloca-merging", DisableInlinedAllocaMerging("disable-inlined-alloca-merging",

cl::init(false), cl::Hidden); cl::init(false), cl::Hidden);

/// A flag for test, so we can print the content of the advisor when running it /// A flag for test, so we can print the content of the advisor when running it

/// as part of the default (e.g. -O3) pipeline. /// as part of the default (e.g. -O3) pipeline.

davidxlUnsubmitted

Not Done

As I stated, I prefer this off by default, or the alternate patch:https://reviews.llvm.org/D121084

davidxl: As I stated, I prefer this off by default, or the alternate patch:https://reviews.llvm.

tstellarUnsubmitted

Not Done

I'm a little confused by all negatives in the option, but I think the default state should be whatever state will fix PR45253. This at least will get us closer to the old-pass-manager behavior and will unblock users. I don't think the option is particularly useful if the default state doesn't fix PR45253.

tstellar: I'm a little confused by all negatives in the option, but I think the default state should be…

static cl::opt<bool> KeepAdvisorForPrinting("keep-inline-advisor-for-printing", static cl::opt<bool> KeepAdvisorForPrinting("keep-inline-advisor-for-printing",

cl::init(false), cl::Hidden); cl::init(false), cl::Hidden);

extern cl::opt<InlinerFunctionImportStatsOpts> InlinerFunctionImportStats; extern cl::opt<InlinerFunctionImportStatsOpts> InlinerFunctionImportStats;

static cl::opt<std::string> CGSCCInlineReplayFile( static cl::opt<std::string> CGSCCInlineReplayFile(

"cgscc-inline-replay", cl::init(""), cl::value_desc("filename"), "cgscc-inline-replay", cl::init(""), cl::value_desc("filename"),

cl::desc( cl::desc(

▲ Show 20 Lines • Show All 766 Lines • ▼ Show 20 Lines while (!Calls->empty() && Calls->front().first->getCaller() == &F) {

continue; continue;

} }

// Check if this inlining may repeat breaking an SCC apart that has // Check if this inlining may repeat breaking an SCC apart that has

// already been split once before. In that case, inlining here may // already been split once before. In that case, inlining here may

// trigger infinite inlining, much like is prevented within the inliner // trigger infinite inlining, much like is prevented within the inliner

// itself by the InlineHistory above, but spread across CGSCC iterations // itself by the InlineHistory above, but spread across CGSCC iterations

// and thus hidden from the full inline history. // and thus hidden from the full inline history.

if (CG.lookupSCC(*CG.lookup(Callee)) == C && LazyCallGraph::SCC *CalleeSCC = CG.lookupSCC(*CG.lookup(Callee));

UR.InlinedInternalEdges.count({&N, C})) { if (CalleeSCC == C && UR.InlinedInternalEdges.count({&N, C})) {

LLVM_DEBUG(dbgs() << "Skipping inlining internal SCC edge from a node " LLVM_DEBUG(dbgs() << "Skipping inlining internal SCC edge from a node "

"previously split out of this SCC by inlining: " "previously split out of this SCC by inlining: "

<< F.getName() << " -> " << Callee.getName() << "\n"); << F.getName() << " -> " << Callee.getName() << "\n");

setInlineRemark(*CB, "recursive SCC split"); setInlineRemark(*CB, "recursive SCC split");

continue; continue;

} }

std::unique_ptr<InlineAdvice> Advice = std::unique_ptr<InlineAdvice> Advice =

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines while (!Calls->empty() && Calls->front().first->getCaller() == &F) {

if (!NewCallee) { if (!NewCallee) {

// Try to promote an indirect (virtual) call without waiting for // Try to promote an indirect (virtual) call without waiting for

// the post-inline cleanup and the next DevirtSCCRepeatedPass // the post-inline cleanup and the next DevirtSCCRepeatedPass

// iteration because the next iteration may not happen and we may // iteration because the next iteration may not happen and we may

// miss inlining it. // miss inlining it.

if (tryPromoteCall(*ICB)) if (tryPromoteCall(*ICB))

NewCallee = ICB->getCalledFunction(); NewCallee = ICB->getCalledFunction();

} }

if (NewCallee) if (NewCallee) {

if (!NewCallee->isDeclaration()) if (!NewCallee->isDeclaration()) {

Calls->push({ICB, NewHistoryID}); Calls->push({ICB, NewHistoryID});

if (CalleeSCC != C &&

CalleeSCC == CG.lookupSCC(CG.get(*NewCallee))) {

nikicUnsubmitted

Not Done

Shouldn't we also check && CG.lookup(CG.get(*NewCallee)) != C here? We do want to allow this if caller, the callee and the new callee are all in the same SCC, right? If everything is one SCC, then the inlining is size-limited through the usual mechanism.

nikic: Shouldn't we also check `&& CG.lookup(CG.get(*NewCallee)) != C` here? We do want to allow this…

nikicUnsubmitted

Not Done

Add a comment here for why the restriction exists.

nikic: Add a comment here for why the restriction exists.

ICB->addFnAttr(Attribute::NoInline);

}

} }

nikicUnsubmitted

Not Done

Calls->push({ICB, NewHistoryID});

- // Continually inlining through an SCC can result in huge compile

- // times and bloated code since we arbitrarily stop at some point

- // when the inliner decides it's not profitable to inline

- // anymore. We put a stop at the first potential attempt at

- // inlining through an SCC by marking the call site as noinline.

- // This doesn't apply to calls in the same SCC since if we do

- // inline through the SCC the function will end up being

- // self-recursive which the inliner bails out on, and inlining

- // within an SCC is necessary for performance.

+ // When inlining from a non-trivial SCC, we may end up flattening

+ // the SCC into calling code, as each individual inlining decision

+ // may be considered profitable. For SCCs with multiple recursive

+ // calls, this can result in an exponential increase in code size

+ // and compile-time. Avoid this by only allowing one level of

+ // inlining across non-trivial SCCs.

+ // This doesn't apply to calls in the same SCC, since inlining will

+ // stop if either the function becomes self-recursive, or all

+ // functions in the SCC become large enough to not permit further

+ // inlining.

if (CalleeSCC != C &&

Just a suggestion, based on my understanding...

nikic: Just a suggestion, based on my understanding...

// Merge the attributes based on the inlining. // Merge the attributes based on the inlining.

AttributeFuncs::mergeAttributesForInlining(F, Callee); AttributeFuncs::mergeAttributesForInlining(F, Callee);

davidxlUnsubmitted

Not Done

This feels like very strict. Is it possible to allow some level (under an option) of iterative inlining into the same SCC? This is because not all cases of inlining into a non-trivial SCC will result in compile time issue due to existing threshold based stopping mechanism (i.e. the new callee in SCC is too large).

davidxl: This feels like very strict. Is it possible to allow some level (under an option) of iterative…

nikicUnsubmitted

Not Done

This does not restrict inlining into the same SCC. What it prevents is continuously peeling off more and more calls from a *different* non-trivial SCC.

From what I can tell, this is a rare case, only two programs in llvm-test-suite show any codegen changes with this patch.

nikic: This does not restrict inlining into the same SCC. What it prevents is continuously peeling off…

davidxlUnsubmitted

Not Done

To clarify, my understanding of the scenario is like the following

foo --> (a, b, c) where foo calls 'a'.

where a, b, and c forms anon trivial SCC.

Without the fix, we can end up with inlining of foo->a, foo->b, foo->c, foo->a, foo->b, .... until foo blows up (we don't have a caller limit set in the inliner).

With this patch, once foo-> a is inlined, the new callsite foo->b is marked as no inline.

Is that the right?

davidxl: To clarify, my understanding of the scenario is like the following foo --> (a, b, c)…

nikicUnsubmitted

Not Done

Yeah, that sounds about right. I believe local inlining history will actually make sure that we only inline foo->a, foo->b, foo->c, which is what prevents this from simply inlining infinitely. But it still ends up flattening the SCC, which is particularly problematic if there are multi-edges in the SCC, in which case flattening is exponential.

D98481 forbids the foo->a inline in the first place, while this patch allows inlining foo->a, but forbids the foo->b inline. There are some cases where this kind of single-level inline is probably useful, e.g. if inlining at this call-site can prune away the recursive parts entirely.

nikic: Yeah, that sounds about right. I believe local inlining history will actually make sure that we…

davidxlUnsubmitted

Not Done

What I am thinking is a light weight mechanism to fix this with more flexibility. Assuming the number of callsites (from outside of SCC) into non-trivial SCC is small, a map from the <caller, non-trivialSCC> pairs to inlined count can be used to track the number of iterative inlining happening between the caller and SCC. If it exceeds the limit, mark the new callsite noInline.

davidxl: What I am thinking is a light weight mechanism to fix this with more flexibility. Assuming the…

wenleiUnsubmitted

Not Done

This change is indeed better than D98481, but I'm still not sure if it is good enough from performance perspective.

I hear the argument about build speed and the earlier comments on strict bottom-up inlining, but different people have different balance between build speed and performance. Great if we can achieve both; otherwise, trying to minimize perf impact and giving user flexibility is probably better.

Recursive inlining is weak for llvm even without this change, and this area is somewhat perf sensitive. We have observed perf gap between gcc and llvm due to less recursive inlining. We also have made changes to allow more recursive inlining with PGO outside of cgscc inliner where there's less constraints for this kind of issues. D109104 is example of more aggressive recursive inlining with PGO leading to visible perf movement even on spec.

We can try to measure this change with our internal benchmarks. But overall I agree with David that a solution that gives people flexibility would be better (either an optional inliner size/growth cap, or cross scc inline history counter like David suggested).

wenlei: This change is indeed better than D98481, but I'm still not sure if it is good enough from…

nikicUnsubmitted

Not Done

Would it be sufficient to add an option for disabling the check to get this landed? Does that add the desired flexibility?

To be clear, this is really not about a compile-time / performance tradeoff -- the exponential nature of the issue means that builds just don't finish in any reasonable timeframe anymore. It's a correctness problem, and as such should have been fixed months ago, even if there were a performance impact.

At this point I cannot justify delaying a fix for this critical bug anymore. If upstream is unwilling to accept a "good enough" fix, then we will make this a required patch for downstream LLVM distributors instead. I had hoped to avoid this, because it diverges upstream LLVM from distro-provided LLVM and causes work for each distributor, but at the same time I also can't have an LLVM 14.0.0 release with this issue unfixed, which is where we're headed now.

PS: I am happy to discuss more invasive ways to address this general class of problems after an immediate fix has landed on the release branch. Something I have been toying with is assigning an inlining penalty to inlined call-sites, based on the cost of the callee it was inlined from. This basically allows those call sites to use up remaining inlining budget from the parent function, but not more than that. This is principled, in that it avoids a cost-model bypass if inlining happens in a non-bottom-up fashion, but I suspect that it will fare worse than ad-hoc solutions in practice, because it would also penalize inlining of promoted or devirtualized calls. But this is just a side-note, as this (as well as the other alternatives we have previously discussed) are not suitable as immediate, minimally invasive, low-impact bug fixes.

nikic: Would it be sufficient to add an option for disabling the check to get this landed? Does that…

xbolva00Unsubmitted

Not Done

Your fix looks good enough; better fix could be added to point release 14.0.1

xbolva00: Your fix looks good enough; better fix could be added to point release 14.0.1

aeubanksAuthorUnsubmitted

Done

https://reviews.llvm.org/D121084 is a similar alternative to this where instead of marking call sites noinline, we make the inline cost of these sorts of calls exponentially more expensive. it's definitely not as good at preventing bloat as this patch based on testing variants of mut-rec-scc.ll, but perhaps it's good enough for real world examples?

aeubanks: https://reviews.llvm.org/D121084 is a similar alternative to this where instead of marking call…

// For local functions or discardable functions without comdats, check // For local functions or discardable functions without comdats, check

xbolva00Unsubmitted

Not Done

What if ICB is always_inline?

xbolva00: What if ICB is always_inline?

// whether this makes the callee trivially dead. In that case, we can drop // whether this makes the callee trivially dead. In that case, we can drop

// the body of the function eagerly which may reduce the number of callers // the body of the function eagerly which may reduce the number of callers

// of other functions to one, changing inline cost thresholds. Non-local // of other functions to one, changing inline cost thresholds. Non-local

// discardable functions with comdats are checked later on. // discardable functions with comdats are checked later on.

bool CalleeWasDeleted = false; bool CalleeWasDeleted = false;

if (Callee.isDiscardableIfUnused() && Callee.hasZeroLiveUses() && if (Callee.isDiscardableIfUnused() && Callee.hasZeroLiveUses() &&

!CG.isLibFunction(Callee)) { !CG.isLibFunction(Callee)) {

if (Callee.hasLocalLinkage() || !Callee.hasComdat()) { if (Callee.hasLocalLinkage() || !Callee.hasComdat()) {

▲ Show 20 Lines • Show All 206 Lines • Show Last 20 Lines

llvm/test/Transforms/Inline/mut-rec-scc-2.ll

This file was added.

				; RUN: opt -S -passes='inline' < %s \| FileCheck %s

				; Make sure we don't mark calls within the same SCC as original function with noinline.
				; CHECK-NOT: noinline

				define void @samescc1() {
				call void @samescc2()
				ret void
				}

				define void @samescc2() {
				call void @samescc3()
				ret void
				}

				define void @samescc3() {
				call void @samescc1()
				ret void
				}

llvm/test/Transforms/Inline/mut-rec-scc.ll

This file was added.

				; RUN: opt -S -passes='cgscc(inline,instcombine)' < %s \| FileCheck %s

				; We use call to a dummy function to avoid inlining test1 into test2 or vice
				; versa, such that we aren't left with a trivial cycle, as trivial cycles are
				; special-cased to never be inlined.
				; However, InstCombine will eliminate these calls after inlining, and thus
				; make the functions eligible for inlining in their callers.
				declare void @dummy() readnone nounwind willreturn

				define void @test1() {
				; CHECK-LABEL: define void @test1(
				; CHECK-NEXT: call void @test2()
				; CHECK-NEXT: call void @test2()
				; CHECK-NEXT: ret void
				;
				call void @test2()
				call void @test2()
				call void @dummy()
				call void @dummy()
				call void @dummy()
				call void @dummy()
				call void @dummy()
				call void @dummy()
				call void @dummy()
				call void @dummy()
				call void @dummy()
				call void @dummy()
				call void @dummy()
				ret void
				}

				define void @test2() {
				; CHECK-LABEL: define void @test2(
				; CHECK-NEXT: call void @test1()
				; CHECK-NEXT: call void @test1()
				; CHECK-NEXT: ret void
				;
				call void @test1()
				call void @test1()
				call void @dummy()
				call void @dummy()
				davidxlUnsubmitted Not Done Reply Inline Actions can you add a callsite to a function test3() which is outside the SCC, where the test3's inlining is deferred until test2 is inlined. This is to test that test3's inlining won't be suppressed. davidxl: can you add a callsite to a function test3() which is outside the SCC, where the test3's…
				aeubanksAuthorUnsubmitted Done Reply Inline Actions do you mean add a call inside test3 to some other new function (e.g. test4)? aeubanks: do you mean add a call inside test3 to some other new function (e.g. test4)?
				davidxlUnsubmitted Not Done Reply Inline Actions ok to leave this as it is for now. davidxl: ok to leave this as it is for now.
				call void @dummy()
				call void @dummy()
				call void @dummy()
				call void @dummy()
				call void @dummy()
				call void @dummy()
				call void @dummy()
				call void @dummy()
				call void @dummy()
				ret void
				}

				; We should inline the @test2 calls and mark the inlined @test1 calls as noinline
				define void @test3() {
				; CHECK-LABEL: define void @test3(
				; CHECK-NEXT: call void @test1() #[[NOINLINE:[0-9]+]]
				; CHECK-NEXT: call void @test1() #[[NOINLINE]]
				; CHECK-NEXT: call void @test1() #[[NOINLINE]]
				; CHECK-NEXT: call void @test1() #[[NOINLINE]]
				; CHECK-NEXT: ret void
				;
				call void @test2()
				call void @test2()
				ret void
				}

				; CHECK: [[NOINLINE]] = { noinline }

This is an archive of the discontinued LLVM Phabricator instance.

[NewPM][Inliner] Mark inlined calls to functions in same SCC as callee as noinlineAbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 412153

llvm/lib/Transforms/IPO/Inliner.cpp

llvm/test/Transforms/Inline/mut-rec-scc-2.ll

llvm/test/Transforms/Inline/mut-rec-scc.ll

[NewPM][Inliner] Mark inlined calls to functions in same SCC as callee as noinline
AbandonedPublic