This is an archive of the discontinued LLVM Phabricator instance.

[SamplePGO] Fix ICE that callee samples returns null while finding import functions
ClosedPublic

Authored by wlei on Jul 6 2023, 11:10 AM.

Details

Summary

We found that in a special condition, the input callee Samples is null for findExternalInlineCandidate, which caused an ICE.

In some rare cases, call instruction could be changed after being pushed into inline candidate queue, this is because earlier inlining may expose constant propagation which can change indirect call to direct call. When this happens, we may fail to find matching function samples for the candidate later(for example if the profile is stale), even if a match was found when the candidate was enqueued.

See this reduced program:

file1.c:

 int bar(int x);

 int(*foo())()  {
  return bar;
 };

void func()
 {
     int (*fptr)(int);
     fptr = foo();
     a +=  (*fptr)(10);
 }

file2.c:

int bar(int x) { return x + 1;}

The two CALL: foo and (*ptr) are pushed into the queue at the beginning, say foo is hotter and popped first for inlining. During the inlining of foo, it performs the constant propagation for the function pointer bar and then changed (*ptr) to a direct call bar(..). Note that at this time, (*ptr)/bar is still in the queue, later while it's popped out for inlining, it use the a different target name(bar) to look for the callee samples. At the same time, if the profile is stale and the new function is different from the old function in the profile, then this led the return of the null callee sample.

Diff Detail

Event Timeline

wlei created this revision.Jul 6 2023, 11:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2023, 11:10 AM
wlei requested review of this revision.Jul 6 2023, 11:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2023, 11:10 AM
wlei edited the summary of this revision. (Show Details)Jul 6 2023, 11:35 AM
wlei added reviewers: hoy, wenlei.
hoy added a comment.Jul 6 2023, 12:58 PM

Consider changing to title to "[SamplePGO] ..." since this is fixing an issue not specific to CSSPGO.

llvm/lib/Transforms/IPO/SampleProfile.cpp
1083

typo: Replayed

1087

Could mention more in the comment about that this is a problem with a stale profile.

wlei retitled this revision from [CSSPGO] Fix ICE that callee samples returns null while finding import functions to [SamplePGO] Fix ICE that callee samples returns null while finding import functions.Jul 6 2023, 3:08 PM
wlei edited the summary of this revision. (Show Details)
wlei updated this revision to Diff 537897.Jul 6 2023, 3:09 PM
wlei edited the summary of this revision. (Show Details)

addressing feedback

wenlei added inline comments.Jul 6 2023, 7:37 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

If it's not beneficial to inline/import (i.e. getExternalInlineAdvisorShouldInline returns false), should we just return without adding the callee to import candidate GUID set?

1087

The popped out call instruction is different from what was pushed.
we only push the CALL instruction when we find its callee sample is not null to make sure we get a non-null samples when we pop the CALL.

nit: make it explicit as to push and pop out of what.. here it's the inline candidate queue.

I would say something like this:

In some rare cases, call instruction could be changed after being pushed into inline candidate queue, this is because earlier inlining may expose const-prop which can change indirect call to direct call. When this happens, we may fail to find matching function samples for the candidate later, even if a match was found when the candidate was enqueued.

wlei added inline comments.Jul 6 2023, 11:18 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

Good question.
I'm just reading the code of getExternalInlineAdvisorShouldInline , it looks to me getExternalInlineAdvisorShouldInline return false doesn't mean it's not beneficial to inlining/import, it's only for replay, which is an uncommon situation, so most of time it return false, we still continue the importing.
Say if we ignore the branch(line 1078), the question becomes : should we import a callee whose samples is null?

This actually didn't happen before, that's why there is an assertion. OK. but as this is an assertion, it seems it just means we shouldn't import it.

so the change can be simplified to change the assertion to a return, WDYT?

@hoy I remember we discussed offline, you suggested to import it, otherwise it would miss the inlining opportunity for this call in post-link.

Also the below logic for non-null samples is to import the hot callee, since this is exposed by stale profile, so if the new function is hot, we have to import it.. but we don't know if the new function is hot or not...

wlei added inline comments.Jul 6 2023, 11:26 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

Say bar is just renamed to baz, we actually need to analyze bar's samples to decide the import lists, but it's a function renaming matching which is a big topic.

wenlei added inline comments.Jul 7 2023, 9:17 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

Lei, I think you're right. Before replay inline was added (D112033), there was only this assertion. I think it makes sense to just return when Samples is null instead of assert non-null, with the explanation of the case we ran into. This should be really rare, so perf wise it probably won't matter import or not.

hoy added inline comments.Jul 7 2023, 9:20 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

Importing it is probably not that beneficial but it matches the behavior of in-module inlining.

hoy added inline comments.Jul 7 2023, 9:28 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1083

How about framing the comment like this:

An inline candidate that is originally an indirect call can be optimized to a direct call due to the inlining of other candidates (e.,g constant propagation of the function pointer). This may cause the an lookup error of the candidate samples if the promoted call target doesn't exist in the profile due to profile staleness. Since it's known that the callsite is a good candidate and it should still be good to inline it regardless. This also matches the behavior of in-module inlining.

wenlei added inline comments.Jul 7 2023, 10:34 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

it matches the behavior of in-module inlining.

Being in the inline candidate queue doesn't mean it will be inlined if it's not external. It only means it will be evaluated. The actual inline decision is in tryInlineCandidate->shouldInlineCandidate (for direct call), and at this point there is no guarantee that it would be inlined if it's not external.

hoy added inline comments.Jul 7 2023, 10:37 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

Right, importing just offers an opportunity for it to be inlined. Both in-module and external inlining have to go through the inliner's cost model once the IR is ready.

wenlei added inline comments.Jul 7 2023, 10:41 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

I was just confused by what you meant "matches the behavior of in-module inlining".

For local functions, it's available, and we evaluate every function calls. For external functions, we can't possibly make all functions available like local functions, so we have to choose to import those that are likely to be inlined based on hotness.

If we import everything regardless of hotness, that would "matches the behavior of in-module inlining", but that's not practical, and not what we do today.

wlei added inline comments.Jul 7 2023, 10:44 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

Importing it is probably not that beneficial but it matches the behavior of in-module inlining.

@hoy On a second thought, speaking of "matches the behavior of in-module inlining.", we also don't inline a callee function whose samples is mismatched/null, as this is "sample profile inlining". If it's other inliner(CGSCC) inline it, other inliner should do the importing.

So maybe just returning null make more sense?

hoy added inline comments.Jul 7 2023, 10:49 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

By matching the behavior of in-module inlining, I mean the case of indirect call promoted to a direct local callee, in which case the promoted callee will be considered inlining without going through a sample lookup:

https://github.com/llvm/clangir/blob/b7d9322b4963e620dfd12246816e6f7b2da5fd88/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1500

In our case, the callee is external. I think it should be imported and go through the same logic above.

hoy added inline comments.Jul 7 2023, 10:54 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

Oh, there's a subtle difference here. Even if the callee is imported, it'll still not considered an inline candidate because of the sample lookup error in postlink. If the callsite has branch metadata, the callee may be inlined by postlink CGSCC. But that's not a match of sample profile inliner. So returning null is more reasonable.

wenlei added inline comments.Jul 7 2023, 11:01 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

indirect call promoted to a direct local callee, in which case the promoted callee will be considered inlining without going through a sample lookup

This is different - the promoted call still go through hotness evaluation (tryPromoteAndInlineCandidate->tryInlineCandidate->shouldInlineCandidate).

The question is, whether we filter on hotness before we decided to import, the answer is yes, and here we don't have hotness, so we should not import.

hoy added inline comments.Jul 7 2023, 11:02 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

An alternative as discussed offline, place a sample lookup check right after popping an candiate. This should line up in-module and external inlining for our case:

https://github.com/llvm/clangir/blob/b7d9322b4963e620dfd12246816e6f7b2da5fd88/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1443

But it sounds a bit over-engineering as the case isn't common, and matching behavior isn't the demanding.

wenlei added inline comments.Jul 7 2023, 11:07 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

matching behavior

I think "sample lookup" is a detail that is not important to match. If we have to match, I'd argue to match the check on hotness before inlining/importing - in which case, we shouldn't import here.

hoy added inline comments.Jul 7 2023, 11:08 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

This is different - the promoted call still go through hotness evaluation (tryPromoteAndInlineCandidate->tryInlineCandidate->shouldInlineCandidate).

I was thinking the imported callee can still go though this but just realized it won't even be considered a candidate as it doesn't have the nested callee profile anywhere.

The question is, whether we filter on hotness before we decided to import, the answer is yes, and here we don't have hotness, so we should not import.

I think given a stale profile, the current behavior is to accidentally trust the hotness in the profile and let it drive the inlining for a promoted icall. If we don't want this, we should consider the alternative above.

wlei added inline comments.Jul 7 2023, 11:13 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

Yeah, bar is changed to baz, ideally we should find the match bar --> baz beforehand, and use bar's hotness to do the importing, right now we don't have the function renaming match yet.

hoy added inline comments.Jul 7 2023, 11:14 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

I'd argue to match the check on hotness before inlining/importing - in which case, we shouldn't import here.

The hotness check is already passed for an inline candidate. But that's a staleness hotness for the old icall targets. If we want to trust it we can do the importing, but as I just realized, it wouldn't help postlink sample loader. If we don't want to trust it, we should consider excluding the in-module case as well.

hoy added inline comments.Jul 7 2023, 11:17 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

Something I was thinking as an alternative:

// Perform iterative BFS call site prioritized inlining
  bool Changed = false;
  while (!CQueue.empty() && F.getInstructionCount() < SizeLimit) {
    InlineCandidate Candidate = CQueue.top();
    CQueue.pop();
    CallBase *I = Candidate.CallInstr;

    // The callsite has been transformed and has no samples, thus should be excluded from inlining and importing
    if (findCalleeFunctionSamples(*I))
       continue;
wenlei added inline comments.Jul 7 2023, 11:20 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

The hotness check is already passed for an inline candidate.

Where did it pass the hotness check? we put calls from getInlineCandidate into the candidate queue, and getInlineCandidate doesn't do any hotness check.

hoy added inline comments.Jul 7 2023, 11:24 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

Where did it pass the hotness check? we put calls from getInlineCandidate into the candidate queue, and getInlineCandidate doesn't do any hotness check.

Oh I see. There was no hotness check when it was added to the queue.

wenlei added inline comments.Jul 7 2023, 11:54 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

Where did it pass the hotness check? we put calls from getInlineCandidate into the candidate queue, and getInlineCandidate doesn't do any hotness check.

Oh I see. There was no hotness check when it was added to the queue.

Now, do we all agree that a hotness check should be needed before importing, and since there's no hotness check, let's just return on null samples, without importing?

hoy added inline comments.Jul 7 2023, 12:02 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

Returning null sounds good to me, as this is not a common case perhaps we do not need to be fancy.

wlei added inline comments.Jul 7 2023, 12:10 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1082–1083

Thanks for the discussion! I will change to return null.

wlei updated this revision to Diff 538225.Jul 7 2023, 12:15 PM
  1. change to simply return(not add to the import list)
  2. update the comments
hoy accepted this revision.Jul 7 2023, 1:21 PM

LGTM, thanks.

This revision is now accepted and ready to land.Jul 7 2023, 1:21 PM
wenlei accepted this revision.Jul 7 2023, 2:15 PM

nit: update the change description too based on the new comment (describing it using push/pop isn't very intuitive). otherwise lgtm.

wlei edited the summary of this revision. (Show Details)Jul 7 2023, 2:53 PM
This revision was landed with ongoing or failed builds.Jul 7 2023, 2:57 PM
This revision was automatically updated to reflect the committed changes.