This is an archive of the discontinued LLVM Phabricator instance.

[AutoFDO] Use getHeadSamplesEstimate instead of getTotalSamples to compute profile callsite staleness
ClosedPublic

Authored by wlei on Dec 14 2022, 3:20 PM.

Details

Summary

Fix two issues for profile staleness report.

  1. It should be more accurate to use the sum of all entry count(getHeadSamplesEstimate) for the callsite samples than the total samples, since even the top-level callsite is mismatched, it does affect the inlining but it can still be merged into base profile and used later.
  1. I accidentally missed to persist the num of mismatched callsite into binary.

Also added the asm testing to test the decoding of the section.

Diff Detail

Event Timeline

wlei created this revision.Dec 14 2022, 3:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2022, 3:20 PM
wlei requested review of this revision.Dec 14 2022, 3:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2022, 3:20 PM
wlei retitled this revision from [AutoFDO]Use getHeadSamplesEstimate instead of getTotalSamples to compute profile callsite staleness to [AutoFDO] Use getHeadSamplesEstimate instead of getTotalSamples to compute profile callsite staleness.Dec 14 2022, 3:28 PM
wlei edited the summary of this revision. (Show Details)
wlei added reviewers: hoy, wenlei.
hoy accepted this revision.Dec 15 2022, 9:10 AM
hoy added inline comments.
llvm/lib/Transforms/IPO/SampleProfile.cpp
2187

nit: NumMismatchedCallsite -> NumMismatchedCallsites, TotalProfiledCallsite -> TotalProfiledCallsites

This revision is now accepted and ready to land.Dec 15 2022, 9:10 AM
wlei updated this revision to Diff 483218.Dec 15 2022, 9:49 AM

addressing feedback.

I thought the idea is to compute the % of samples being dropped due to mismatch? in this case, all samples from callsite will be dropped, so I actually don't see a problem with using getTotalSamples. Yes you can argue that they can be merged with base profile, but then that argument applies to getHeadSamplesEstimate too?

wlei added a comment.Dec 15 2022, 10:30 AM

I thought the idea is to compute the % of samples being dropped due to mismatch? in this case, all samples from callsite will be dropped, so I actually don't see a problem with using getTotalSamples. Yes you can argue that they can be merged with base profile, but then that argument applies to getHeadSamplesEstimate too?

Yeah, it's tricky to quantify the dropped callsite samples's impact. I was thinking in a reversed way, if we use getTotalSamples, I feel like the total samples are completely dropped(not merged into top-level profile), in fact it's still in use. And for getHeadSamplesEstimate, I feel it's like only the callsite call/jump's samples are dropped, but I admit that this's also not accurate, the missing inlining from big total samples could affect more on perf. I don't have strong opinion on this.

hoy added a comment.Dec 15 2022, 10:40 AM

I thought the idea is to compute the % of samples being dropped due to mismatch? in this case, all samples from callsite will be dropped, so I actually don't see a problem with using getTotalSamples. Yes you can argue that they can be merged with base profile, but then that argument applies to getHeadSamplesEstimate too?

Yeah, it's tricky to quantify the dropped callsite samples's impact. I was thinking in a reversed way, if we use getTotalSamples, I feel like the total samples are completely dropped(not merged into top-level profile), in fact it's still in use. And for getHeadSamplesEstimate, I feel it's like only the callsite call/jump's samples are dropped, but I admit that this's also not accurate, the missing inlining from big total samples could affect more on perf. I don't have strong opinion on this.

Reporting mismatched samples based on getHeadSamplesEstimate sounds a bit more accurate to me, thought it's not perfect either. The direct effect of mismatched callsite samples is likely a missing inlining. Reporting callee total samples for this may not give a good signal, especially when the callee is really big but the callsite isn't very hot.

We have some internal services showing a very high callsite mismatch rate like 30%. Wondering if that could be related.

wlei added a comment.Dec 15 2022, 10:42 AM

I thought the idea is to compute the % of samples being dropped due to mismatch? in this case, all samples from callsite will be dropped, so I actually don't see a problem with using getTotalSamples. Yes you can argue that they can be merged with base profile, but then that argument applies to getHeadSamplesEstimate too?

Or since it's arguable, needed more discussion/experiments, I can make a separate patch for that, make this to fix the obvious issues first.

I thought the idea is to compute the % of samples being dropped due to mismatch? in this case, all samples from callsite will be dropped, so I actually don't see a problem with using getTotalSamples. Yes you can argue that they can be merged with base profile, but then that argument applies to getHeadSamplesEstimate too?

Yeah, it's tricky to quantify the dropped callsite samples's impact. I was thinking in a reversed way, if we use getTotalSamples, I feel like the total samples are completely dropped(not merged into top-level profile), in fact it's still in use. And for getHeadSamplesEstimate, I feel it's like only the callsite call/jump's samples are dropped, but I admit that this's also not accurate, the missing inlining from big total samples could affect more on perf. I don't have strong opinion on this.

I guess it depends on how we define "callsite samples" -- is it about the actual call count, or about the total samples of callees. It has some ambiguity. Maybe keep this change, but call it mismatched call site count instead?

I thought the idea is to compute the % of samples being dropped due to mismatch? in this case, all samples from callsite will be dropped, so I actually don't see a problem with using getTotalSamples. Yes you can argue that they can be merged with base profile, but then that argument applies to getHeadSamplesEstimate too?

Yeah, it's tricky to quantify the dropped callsite samples's impact. I was thinking in a reversed way, if we use getTotalSamples, I feel like the total samples are completely dropped(not merged into top-level profile), in fact it's still in use. And for getHeadSamplesEstimate, I feel it's like only the callsite call/jump's samples are dropped, but I admit that this's also not accurate, the missing inlining from big total samples could affect more on perf. I don't have strong opinion on this.

Reporting mismatched samples based on getHeadSamplesEstimate sounds a bit more accurate to me, thought it's not perfect either. The direct effect of mismatched callsite samples is likely a missing inlining. Reporting callee total samples for this may not give a good signal, especially when the callee is really big but the callsite isn't very hot.

We have some internal services showing a very high callsite mismatch rate like 30%. Wondering if that could be related.

Logically I think a call site focused metric makes sense. But the way CallsiteSamples is defined led people to think this is total samples. I think perhaps we just need to be explicit in the naming, so it's clear that we're tracking call site counts, not call site samples..

wlei added a comment.Dec 15 2022, 10:59 AM

I thought the idea is to compute the % of samples being dropped due to mismatch? in this case, all samples from callsite will be dropped, so I actually don't see a problem with using getTotalSamples. Yes you can argue that they can be merged with base profile, but then that argument applies to getHeadSamplesEstimate too?

Yeah, it's tricky to quantify the dropped callsite samples's impact. I was thinking in a reversed way, if we use getTotalSamples, I feel like the total samples are completely dropped(not merged into top-level profile), in fact it's still in use. And for getHeadSamplesEstimate, I feel it's like only the callsite call/jump's samples are dropped, but I admit that this's also not accurate, the missing inlining from big total samples could affect more on perf. I don't have strong opinion on this.

Reporting mismatched samples based on getHeadSamplesEstimate sounds a bit more accurate to me, thought it's not perfect either. The direct effect of mismatched callsite samples is likely a missing inlining. Reporting callee total samples for this may not give a good signal, especially when the callee is really big but the callsite isn't very hot.

We have some internal services showing a very high callsite mismatch rate like 30%. Wondering if that could be related.

Logically I think a call site focused metric makes sense. But the way CallsiteSamples is defined led people to think this is total samples. I think perhaps we just need to be explicit in the naming, so it's clear that we're tracking call site counts, not call site samples..

Sounds good to change to the callsite counts. CallsiteSamples in the nested profile is indeed the total samples of the callee, which is a confusing name.

wenlei accepted this revision.Dec 15 2022, 11:01 AM

actually you already record it as NumMismatchedCallsites, so I think it should be good. :)

wlei added a comment.Dec 15 2022, 11:09 AM

actually you already record it as NumMismatchedCallsites, so I think it should be good. :)

Ah, Okay! Also avoid the Num and Count name confusing :)