This is an archive of the discontinued LLVM Phabricator instance.

[Inliner] Boost inlining of an indirect call to always_inline function.
AbandonedPublic

Authored by eraman on Jun 21 2017, 12:11 PM.

Details

Reviewers
chandlerc
Summary

If inlining baz->bar makes an indirect call inside bar to be a direct
call to foo with always_inline hint, boost the chances of baz->bar
inlining (and therefore bar->foo inlining) by adding a large negative
cost. I have used LastCallToStaticBonus here, but if that makes it
confusing I will create another constant with similar high value.

Event Timeline

eraman created this revision.Jun 21 2017, 12:11 PM
davidxl added inline comments.Jun 21 2017, 9:20 PM
lib/Analysis/InlineCost.cpp
978

Maybe rename LastCallToStaticBonus more appropriately ?

eraman added inline comments.Jun 22 2017, 10:15 AM
lib/Analysis/InlineCost.cpp
978

I am open to suggestions. How about AlmostAlwaysInlineBonus ?

This seems likely to lead to surprising behavior. LastCallToStaticBonus is very large, so we'll end up duplicating a lot of code in some cases, which could be surprising. On the other hand, it isn't infinitely large, so you can't depend on always_inline to trigger the behavior reliably.

What's the motivation for this change?

This seems likely to lead to surprising behavior. LastCallToStaticBonus is very large, so we'll end up duplicating a lot of code in some cases, which could be surprising. On the other hand, it isn't infinitely large, so you can't depend on always_inline to trigger the behavior reliably.

What's the motivation for this change?

Consider a function B that takes a function pointer as an argument and calls that. Let's assume A invokes B and passes a pointer to C which has always_inline attribute. It is reasonable to relax the A->B inlining limits so that we could honor the always_inline hint on C. I think this is a fairly uncommon scenario that size increase is not a concern in general, but yes this could end up duplicating a lot in some cases. One way to think about is what could the programmer do to ensure C always gets inlined. They could clone B to B' that directly calls C and change a subset of B's callers to B'. This roughly doubles the size of B. We could perhaps approximate this by giving a bonus of <some large value> / (# callers of B in the module).

I think this is a fairly uncommon scenario that size increase is not a concern in general,

All code written in C contains always_inline functions because C libraries mark inline functions always_inline. This means we're potentially introducing gigantic size regressions into every piece of C code in existence. I don't want to deal with that.

chandlerc edited edge metadata.Jun 22 2017, 2:35 PM

FWIW, I'm not really a fan of this behavior in the inliner.

I know the test case that motivates this, and as was discussed previously I suspect using a template argument would be better. I think this is actually a general pattern we can recommend rather than using alway_inline for this:

__attribute__((always_inline)) inline void f(int, int) {
  volatile int x;
  x = 42; x = 42; x = 42; x = 42;
}

template <typename FunctionT, FunctionT *Function>
void g(int *a, int *b, int count)  {
  for (int i = 0; i < count; ++i)
    Function(a[i], b[i]);
}

void h(int *a, int *b, int count) {
  g<decltype(f), f>(a, b, count);
}

Even at -O0 this doesn't allows the always_inline to have its effect: https://godbolt.org/g/S4HzTD

It also makes it explicit in the source code that the programmer *wants* code duplication. For something this powerful, that seems like the right general approach.

Chandler's example does not involve any indirect calls so it is not related to this patch.

I think Easwaran's patch is consistent with the current inline behavior -- it boots inlining to callee with indirect callsites that can be turned into direct calls and then further inlined. The amount of boost is the benefit of inlining the indirect callsites (i.e., the difference between inline cost and allowed threshold). In fact, if the CA.analyzeCall call in the existing code

auto IndirectCallParams = Params;
IndirectCallParams.DefaultThreshold = InlineConstants::IndirectCallThreshold;
CallAnalyzer CA(TTI, GetAssumptionCache, GetBFI, PSI, *F, CS,
                IndirectCallParams);
if (CA.analyzeCall(CS)) {
    ...

is turned into getInlineCost(CS), it will achieve similar effect -- except that InlineCost associated with AlwaysInline is a little too extreme.

In many cases, AlwaysInline is very strong hint by the user to indicate performance benefit of inlining the function, it does not make sense to completely ignore it.

Chandler's example does not involve any indirect calls so it is not related to this patch.

Actually, it is...

The original test case doesn't look exactly like my example. Rather than a template argument and a call through that template argument, it uses an indirect call to a normal function argument.

My suggestion is that the code should be changed to use a template argument if the intent is to make always_inline actually *always* inline.

I think Easwaran's patch is consistent with the current inline behavior -- it boots inlining to callee with indirect callsites that can be turned into direct calls and then further inlined. The amount of boost is the benefit of inlining the indirect callsites (i.e., the difference between inline cost and allowed threshold). In fact, if the CA.analyzeCall call in the existing code

auto IndirectCallParams = Params;
IndirectCallParams.DefaultThreshold = InlineConstants::IndirectCallThreshold;
CallAnalyzer CA(TTI, GetAssumptionCache, GetBFI, PSI, *F, CS,
                IndirectCallParams);
if (CA.analyzeCall(CS)) {
    ...

is turned into getInlineCost(CS), it will achieve similar effect -- except that InlineCost associated with AlwaysInline is a little too extreme.

In many cases, AlwaysInline is very strong hint by the user to indicate performance benefit of inlining the function, it does not make sense to completely ignore it.

I don't think this is really the right way to model the always_inline attribute. I continue to think that this attribute should mean that we very literally *always* inline. See the discussion here for more details:
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089466.html

(Note that I *also* am still in favor of having an attribute that is actually a very strong hint to do inlining, I just don't think any of the existing attributes really fit that bill, and I don't think the original motivating test case is a particularly compelling example. Even if a user *does* want to get really strong inlining because of a weird indirect function call, I don't know why the inliner has to go to great lengths to detect this rather than expecting the user to also attribute the function receiving the function pointer...)

eraman abandoned this revision.Jun 26 2017, 11:01 AM