This is an archive of the discontinued LLVM Phabricator instance.

[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks
ClosedPublic

Authored by wenlei on Jul 13 2020, 10:16 PM.

Details

Summary

This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context.

A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose.

Diff Detail

Event Timeline

wenlei created this revision.Jul 13 2020, 10:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2020, 10:16 PM
mtrofin added inline comments.Jul 13 2020, 11:02 PM
llvm/include/llvm/Analysis/InlineAdvisor.h
182

Nit: could this be factored in its own .h file (and .cpp)?

187

nit: lower case first letter, also verb form (e.g. areReplayRemarksLoaded)

191

Nit: initialize HasReplayRemarks here, then the state of the object is deterministic.

llvm/lib/Analysis/InlineAdvisor.cpp
169

this assumes DIL->getLine() >= DIL->getScope()->getSubprogram()->getLine(). Perhaps an assert before, or Offset could be int, and check it's non-negative?

llvm/lib/Transforms/IPO/SampleProfile.cpp
330

I'm curious, is there a case when FAM would be not passed? (i.e. why not just require it?)

912

why not define Advice within the if below?

wenlei marked 6 inline comments as done.Jul 14 2020, 7:59 AM
wenlei added inline comments.
llvm/lib/Analysis/InlineAdvisor.cpp
169

This is consistent with addLocationToRemarks where we construct the location string for remarks, so the output can be consumed without change. Negative line offset is possible, we've seen that in FDO profile, and they're encoded with unsigned int there too - I don't know why that's the case, but seems there's convention and functionality-wise it works as long as everything is consistent..

llvm/lib/Transforms/IPO/SampleProfile.cpp
330

There's another call site from old pass manager pipeline, where we don't have FAM.

wenlei updated this revision to Diff 277835.Jul 14 2020, 8:00 AM

Address feedbacks.

mtrofin added inline comments.Jul 14 2020, 8:03 AM
llvm/lib/Analysis/InlineAdvisor.cpp
169

I don't think that makes it right, though. I'd suggest at least having a test here, then, i.e. bailing out if the this is greater than lhs. I'll look into the other API.

wenlei marked an inline comment as done.Jul 14 2020, 8:36 AM
wenlei added inline comments.
llvm/lib/Analysis/InlineAdvisor.cpp
169

I'm not sure if we want to bail out on negative offsets. I added addLocationToRemarks couple weeks back. Unsigned int was used there so we can match line offset from remarks to the line offset from FDO profile for negative offsets, which is important from tooling perspective.

Ideally we shouldn't have negative offset, but practically, we see it. If we put an assertion there, it will fire. Due to macro, there could be cases where LHS and RHS come from different files, hence the subtraction can lead to negative offset; and there could be other cases due to not having a line number on LHS so 0 is retrieved for LHS. We've been living with that for a while (see FunctionSamples::getOffset), and here's an example of FDO profile where negative offset is encoded using unsigned int (the last three lines).

Not advocating for that, but just thought asserting non-negative offset or fixing what's breaking that assertion is a separate work. And being consistent with what we've been doing (e.g. FunctionSamples::getOffset) should be good. What do you think?

ZSTD_decompressSequences_bmi2:736988668:1677
 0: 1557
 6: 93
 5: ZSTD_decompressSequences_body:736954111
  8: 1557
  9: 1557
  10: 1557
  11: 1557
  53: 109
  55: 109
  58: 0
  64693: 1520
  64940: 1564
  65080: 1264073

Can this be extended to the SCC inliner?

Can this be extended to the SCC inliner?

Yes, we can use it in SCC inliner as well. We just need extra plumbing there. We can do that in a separate change if needed.

wmi added a comment.Jul 14 2020, 11:02 AM

Can this be extended to the SCC inliner?

Yes, we can use it in SCC inliner as well. We just need extra plumbing there. We can do that in a separate change if needed.

Thanks for the work, using inline advisor to replay inline is exactly something we want too.

Currently every inline remark message has only one level: one caller, one callee and one callsite (maybe with multiple levels of inline stack associated with the callsite). One problem I am thinking of is: the inline advisor may only work with top-down inlining. In bottom-up inlining, if we decide to inline a specific callsite, it will actually be inlined at many places after its caller also being inlined into multiple places in the caller's callers. Since SCC inliner uses bottom-up inlining, current format of inline advise may not provide precise enough information to specify an exact location where inlining is expected to happen.

Have you ever considered to use a callpath to specify the inlining location?

mtrofin added inline comments.Jul 14 2020, 1:27 PM
llvm/lib/Analysis/InlineAdvisor.cpp
169

So I make sure I understand:

  • remarks already output unsigned ints
  • negative offsets are possible, due to cases like you mentioned

In this case, I agree that the choice of supporting negative offsets is a separate problem from this CL.

Could you add a comment explaining the current motivation for unsigned it? (i.e. match what remarks do)

Also, could it be uint32_t, to ensure 32 bit-ness?

Can this be extended to the SCC inliner?

Yes, we can use it in SCC inliner as well. We just need extra plumbing there. We can do that in a separate change if needed.

I'd be curious what the scenario in the SCC case would be. IIUC, here, the value is that you can replay decisions made with a profile, and use a different profile for everything else in the compiler. Hmm... I suppose maybe a similar scenario could be articulated for the SCC case?

On this - could you please add in the patch description the motivating scenario (helps with understanding) - thanks!

In D83743#2151020, @wmi wrote:

Can this be extended to the SCC inliner?

Yes, we can use it in SCC inliner as well. We just need extra plumbing there. We can do that in a separate change if needed.

Thanks for the work, using inline advisor to replay inline is exactly something we want too.

Currently every inline remark message has only one level: one caller, one callee and one callsite (maybe with multiple levels of inline stack associated with the callsite). One problem I am thinking of is: the inline advisor may only work with top-down inlining. In bottom-up inlining, if we decide to inline a specific callsite, it will actually be inlined at many places after its caller also being inlined into multiple places in the caller's callers. Since SCC inliner uses bottom-up inlining, current format of inline advise may not provide precise enough information to specify an exact location where inlining is expected to happen.

Have you ever considered to use a callpath to specify the inlining location?

Glad to know that it will be useful for your case too. You're right that with a bottom-up CGSCC inliner, we won't be able to replay arbitrary inline decisions. However I think that's a limitation of the inliner, not how inline decisions are represented. E.g. even if we have a full inline tree (subset of call graph), we still won't be able to replay that with CGSCC inliner if there's context-sensitive inline (specialization) in it.

The use case we had was for tuning full context-sensitive AutoFDO's early inlining - we can replay baseline AutoFDO early inlining there, or the other way around. We could also replay CGSCC inline of one build in another. The way it's implemented would work for those scenarios. But replay FDO early inline in CGSCC inline or the opposite may not get us what we wanted for the reasons you mentioned.

Can this be extended to the SCC inliner?

Yes, we can use it in SCC inliner as well. We just need extra plumbing there. We can do that in a separate change if needed.

I'd be curious what the scenario in the SCC case would be. IIUC, here, the value is that you can replay decisions made with a profile, and use a different profile for everything else in the compiler. Hmm... I suppose maybe a similar scenario could be articulated for the SCC case?

On this - could you please add in the patch description the motivating scenario (helps with understanding) - thanks!

Our current use case is not in the SCC inliner. But yeah, I image it could be useful there too. In general I feel some mechanism like this one to allow external input for tweaking inlining decision can be useful, mostly for tuning and experimental purpose. We currently don't have a good way of doing that, existing attributes like alwaysinline is function level, not call site level. And per-callsite inline intrinsic like https://reviews.llvm.org/D51200 is intrusive and hard to push through either. I thought this is a relatively easy and clean way of getting that functionality for tuning. I will update the description to include the motivation.

wenlei edited the summary of this revision. (Show Details)Jul 14 2020, 5:59 PM
wenlei updated this revision to Diff 278037.Jul 14 2020, 6:24 PM

Address feedback, update comment.

wmi added a comment.Jul 15 2020, 8:55 AM

Thanks for the explanation.

We could also replay CGSCC inline of one build in another. The way it's implemented would work for those scenarios.

When you replay CGSCC, the inline decision in replay file should have order? If it is unordered, I cannot see how it can replay CGSCC inline from another build exactly.

Our current use case is not in the SCC inliner. But yeah, I image it could be useful there too. In general I feel some mechanism like this one to allow external input for tweaking inlining decision can be useful, mostly for tuning and experimental purpose. We currently don't have a good way of doing that, existing attributes like alwaysinline is function level, not call site level. And per-callsite inline intrinsic like https://reviews.llvm.org/D51200 is intrusive and hard to push through either. I thought this is a relatively easy and clean way of getting that functionality for tuning.

That is the scenario I am interested -- using it for tweaking inlining decision for tuning/experimental purpose. I understand it could need a lot more work than this patch and we may not need to address them currently. I just want to know how it should work in your mind. Could you explain it in more detail?

In D83743#2153532, @wmi wrote:

Thanks for the explanation.

We could also replay CGSCC inline of one build in another. The way it's implemented would work for those scenarios.

When you replay CGSCC, the inline decision in replay file should have order? If it is unordered, I cannot see how it can replay CGSCC inline from another build exactly.

I might be missing something, but here's how I think about this. For a given inline decision, say D on A->B->C path, we would only evaluate once and make that decision once. Then for replay, assuming we follow the same bottom-up order, for each call site, we can retrieve that exact decision from input remarks, and as long as we make the same decisions, the sequence of call sites that gets exposed for evaluation will also be the same. Because we evaluate call site for each path only once, any decision recorded in remarks should be unique, then I'm not sure how order of the inline remarks interfere with replay.

Our current use case is not in the SCC inliner. But yeah, I image it could be useful there too. In general I feel some mechanism like this one to allow external input for tweaking inlining decision can be useful, mostly for tuning and experimental purpose. We currently don't have a good way of doing that, existing attributes like alwaysinline is function level, not call site level. And per-callsite inline intrinsic like https://reviews.llvm.org/D51200 is intrusive and hard to push through either. I thought this is a relatively easy and clean way of getting that functionality for tuning.

That is the scenario I am interested -- using it for tweaking inlining decision for tuning/experimental purpose. I understand it could need a lot more work than this patch and we may not need to address them currently. I just want to know how it should work in your mind. Could you explain it in more detail?

I am considering different modes for replay in the future, the first is strict mode like what we have here - we simply replay everything faithfully. The second mode is positive replay, e.g. we only apply extra inlining using the input. A third mode can be negative replay, that is we prohibit inlining specified in input without affecting others. Orthogonally, we sometime may also want to focus on a specific inline tree, so it may be useful to have an unknown decision from advisor, then we can fall back to other advisors for decisions outside of the inline tree specified in input.

wmi added inline comments.Jul 16 2020, 10:46 AM
llvm/lib/Analysis/ReplayInlineAdvisor.cpp
60–80

Can we extract this part to a function? I think it can be reused by other types of InlineAdvisor, for tuning purpose for example.

llvm/test/Transforms/SampleProfile/Inputs/inline-replay.txt
3

Why it is sum:1 instead of _Z3sumii:1 @ main:3.1?

wenlei marked 3 inline comments as done.Jul 16 2020, 11:43 AM
wenlei added inline comments.
llvm/lib/Analysis/ReplayInlineAdvisor.cpp
60–80

Sure, done.

llvm/test/Transforms/SampleProfile/Inputs/inline-replay.txt
3

Oh, this has to do with the contrived !dbg metadata where linkageName was missing. Changed both the remarks here and !dbg to include mangle names now.

wenlei updated this revision to Diff 278561.Jul 16 2020, 11:43 AM
wenlei marked an inline comment as done.

address feedbacks.

wenlei marked 2 inline comments as done.Jul 16 2020, 11:45 AM
wenlei added inline comments.
llvm/lib/Analysis/InlineAdvisor.cpp
169

Comment added, and also changed to use uint32_t.

wmi accepted this revision.Jul 16 2020, 12:07 PM

LGTM. Better wait another day to see if other reviewers have further comments.

This revision is now accepted and ready to land.Jul 16 2020, 12:07 PM
mtrofin accepted this revision.Jul 16 2020, 12:16 PM

lgtm - a small nit, could you also add in the description the current scope (the top down, SampleProfile case) - for clarity. Thanks!

wenlei edited the summary of this revision. (Show Details)Jul 16 2020, 12:36 PM
wenlei marked an inline comment as done.

lgtm - a small nit, could you also add in the description the current scope (the top down, SampleProfile case) - for clarity. Thanks!

Sure, updated. Thank you all for feedbacks and quick review!

This revision was automatically updated to reflect the committed changes.

The inline-replay.ll test that this added appears to be broken (as was noted by the pre-merge check https://reviews.llvm.org/harbormaster/unit/view/120108/) and is failing in our build.

The inline-replay.ll test that this added appears to be broken (as was noted by the pre-merge check https://reviews.llvm.org/harbormaster/unit/view/120108/) and is failing in our build.

Sorry for the breakage, this wasn't caught in my local build. Taking a look.. Feel free to revert to get unblock.