This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Report zero-count probe in profile instead of dangling probes.
ClosedPublic

Authored by hoy on Jun 11 2021, 9:42 AM.

Details

Summary

Previously dangling samples were represented by INT64_MAX in sample profile while probes never executed were not reported. This was based on an observation that dangling probes were only at a smaller portion than zero-count probes. However, with compiler optimizations, dangling probes end up becoming at large portion of all probes in general and reporting them does not make sense from profile size point of view. This change flips sample reporting by reporting zero-count probes instead. This enabled dangling probe to be represented by none (missing entry in profile). This has a couple benefits:

  1. Reducing sample profile size in optimize mode, even when the number of non-executed probes outperform the number of dangling probes, since INT64_MAX takes more space over 0 to encode.
  1. Binary size savings. No need to encode dangling probe anymore, since missing probes are treated as dangling in the profile reader.
  1. Reducing compiler work to track dangling probes. However, for probes that are real dead and removed, we still need the compiler to identify them so that they can be reported as zero-count, instead of mistreated as dangling probes.
  1. Improving counts quality by respecting the counts already collected on the non-dangling copy of a probe. A probe, when duplicated, gets two copies at runtime. If one of them is dangling while the other is not, merging the two probes at profile generation time will cause the real samples collected on the non-dangling one to be discarded. Not reporting the dangling counterpart will keep the real samples.
  1. Better readability.
  1. Be consistent with non-CS dwarf line number based profile. Zero counts are trusted by the compiler counts inferencer while missing counts will be inferred by the compiler.

Note that the current patch does include any work for #3. There will be follow-up changes.

For #1, I've seen for a large Facebook service, the text profile is reduced by 7%. For extbinary profile, the size of LBRProfileSection is reduced by 35%.

For #4, I have seen general counts quality for SPEC2017 is improved by 10%.

Diff Detail

Event Timeline

hoy created this revision.Jun 11 2021, 9:42 AM
hoy requested review of this revision.Jun 11 2021, 9:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 11 2021, 9:42 AM
hoy added a subscriber: spupyrev.
hoy planned changes to this revision.Jun 11 2021, 10:18 AM

Will come back with more numbers.

It is strange that representation of probes affects counts quality (and hence, generated binaries). Is this a temporary state (that hopefully will be improved in the future)?

In general I agree that the suggested representation (0 for zero-counts, missing for dangles) is way more readable, and thus, less-error-prone.

hoy added a comment.Jun 14 2021, 12:59 PM

It is strange that representation of probes affects counts quality (and hence, generated binaries). Is this a temporary state (that hopefully will be improved in the future)?

In general I agree that the suggested representation (0 for zero-counts, missing for dangles) is way more readable, and thus, less-error-prone.

The current change affects counts quality in that

  1. Reporting more dangling probes than real dead probes. I believe we haven't spotted all places in the compiler that can remove dangling probes, and in those cases dangle probes were treated as real dead. With the current patch, more probes including some real dead probes are reported as dangling. This could be good or bad, depending on benchmarks. Current results show this is better in general. We're thinking about identifying real dead probes in the compiler and specially mark them as the next step.
  1. Reporting what's already collected on the non-dangling sibling of a dangling probe. Previously reporting the danglingness of a probe caused real samples collected on other non-dangling copies of the same probe to be lost.

#2 can be improved by reporting both danglingness and real samples in future, if that's helpful We would need to figure out how to encode them together.

I collected more data for this change. For SPEC2017, besides the counts quality gain, I've also seen a 0.4% perf gain on average, mainly from 602.gcc_s, 623.xalancbmk_s and 625.x264_s. For cinder, a 8% counts quality gain with a neutral perf result.

hoy updated this revision to Diff 351974.Jun 14 2021, 1:04 PM

Updating D104129: [CSSPGO] Report zero-count probe in profile instead of dangling probes.

Thanks for working on this. Among all deleted probes, if the majority of them are actually dangling (i.e removed due to hoisting, merging like optimization as opposed to dce), it makes sense to default to dangling and explicitly marking non-dangling. Previously we assumed that majority of deleted probes are not dangled, hence we choose to mark dangled ones explicitly.

It seems that our assumption has been wrong, which is why this change actually improves profile quality.

This enabled dangling probe to be represented by both INT64_MAX and none (missing entry in profile).

Why do we still want to allow INT64_MAX representation of dangling probe? Would it be cleaner to only use missing probe to represent dangled probes?

Reducing compiler work to track dangling probes. However, for probes that are real dead and removed, we still need the compiler to identify them so that they can be reported as zero-count, instead of mistreated as dangling probes.

While explicitly marking real dead probes can be a follow up patch. Should we remove moveAndDanglePseudoProbes completely together with this patch?

Moreover, since we treat missing probe the same as dangling probe in getProbeWeight and hence in profile inference, it seems we no longer need the concept of dangling. All we need is unknown (missing) vs known (explicitly marked 0 in the future for dead code, or probe with a real non-zero count). The term dangling come from the pass1 optimization side, dangle reflects how a probe looks after its containing block is removed, now if we don't need to track such probes from pass1 optimization side, it seems the whole concept of dangling can go away to simply things?

llvm/tools/llvm-profgen/ProfileGenerator.cpp
554

Mainly for saving profile size can be a misleading comment. The main benefits we see are: 1) better profile quality when we default all missing probe to be unknown (vs previously we only treat marked probes as unknown), since we have more unknown probes than dead probes. 2) allowing probe with count to take precedence over dangling ones when merging.

If doing this regress profile quality, we probably won't do it even if it leads to smaller profile size. If we go further down this route, we may end up removing InvalidProbeCount altogether, then saving profile size can be confusing to others as others wouldn't know where we came from.

558

Not that we don't do anything for dangling probes, can we remove the moveAndDanglePseudoProbes from compiler too? These will be missing, and not having 0 without any special handling.

Furthermore, this is the only place we generate InvalidProbeCount, with this change, all the special casing for InvalidProbeCount can be removed too?

Be consistent with non-probe profile.

AutoFDO profile usually does not fill in zeros. counts below a certain threshold is omitted to save profile size.

Now in order to differentiate from unknown, the approach taken here is to always mark any known count including zero. In that sense this is different from AutoFDO (actually before this change, dangling probes are marked so zero counts can be omitted which is closer to AutoFDO for representing sparse profile).

Improving counts quality by respecting the counts already collected on the non-dangling sibling of the danling probe

nit on the wording, sibling leads others to think it's a different probe under the same probe inline tree. Here, it's really just copies of the same probe (probes sharing the same Id).

hoy added a comment.Jun 15 2021, 9:32 AM

Thanks for working on this. Among all deleted probes, if the majority of them are actually dangling (i.e removed due to hoisting, merging like optimization as opposed to dce), it makes sense to default to dangling and explicitly marking non-dangling. Previously we assumed that majority of deleted probes are not dangled, hence we choose to mark dangled ones explicitly.

It seems that our assumption has been wrong, which is why this change actually improves profile quality.

This enabled dangling probe to be represented by both INT64_MAX and none (missing entry in profile).

Why do we still want to allow INT64_MAX representation of dangling probe? Would it be cleaner to only use missing probe to represent dangled probes?

Yes, I was thinking about to have it completely removed separately. With this patch, we can still load profiles with the INT64_MAX annotation.

Reducing compiler work to track dangling probes. However, for probes that are real dead and removed, we still need the compiler to identify them so that they can be reported as zero-count, instead of mistreated as dangling probes.

While explicitly marking real dead probes can be a follow up patch. Should we remove moveAndDanglePseudoProbes completely together with this patch?

I plan to do it separately. Besides removing moveAndDanglePseudoProbes, we also need to remove dangling probes from empty blocks, i.e, the code in pseudo-probe-inserter.cpp.

Moreover, since we treat missing probe the same as dangling probe in getProbeWeight and hence in profile inference, it seems we no longer need the concept of dangling. All we need is unknown (missing) vs known (explicitly marked 0 in the future for dead code, or probe with a real non-zero count). The term dangling come from the pass1 optimization side, dangle reflects how a probe looks after its containing block is removed, now if we don't need to track such probes from pass1 optimization side, it seems the whole concept of dangling can go away to simply things?

Yes, we do not need the concept of dangling probes anymore, instead, we'll need the concept of real dead probes. The check for dangling probe in pass1 will be replaced with checking for real dead probes.

llvm/tools/llvm-profgen/ProfileGenerator.cpp
554

Sounds good. Will update comment.

558

Yes, planning to do it separately. There was the original patch that introduced dangling probe. Hopefully we can just revert it. What do you think?

hoy updated this revision to Diff 352170.Jun 15 2021, 9:59 AM

Addressing Wenlei's comment.

hoy updated this revision to Diff 352277.Jun 15 2021, 3:53 PM

Removing InvalidProbeCount definition and uses.

Be consistent with non-probe profile.

AutoFDO profile usually does not fill in zeros. counts below a certain threshold is omitted to save profile size.

Now in order to differentiate from unknown, the approach taken here is to always mark any known count including zero. In that sense this is different from AutoFDO (actually before this change, dangling probes are marked so zero counts can be omitted which is closer to AutoFDO for representing sparse profile).

Improving counts quality by respecting the counts already collected on the non-dangling sibling of the danling probe

nit on the wording, sibling leads others to think it's a different probe under the same probe inline tree. Here, it's really just copies of the same probe (probes sharing the same Id).

Please update description: 1) not using the term sibling probe, 2), remove or update "Be consistent with non-probe profile.". 3), update "be represented by both INT64_MAX and none (missing entry in profile)"

llvm/tools/llvm-profgen/ProfileGenerator.cpp
554

If you mention dangling in the comments, make sure it will be updated when we remove the concept of dangling altogether.

What I actually meant is that we don't need to explain the benefit comparing to the old approach since the old approach will be completely gone. Just comment it as if we arrive at this from a clean slate, in which case there should be no place for dangling probe. The extra context is good for commit message, but it's not so relevant for comment. Here we're simply marking probes that don't have any sample hits with zero count so we can differentiate probe with known count from unknown (deleted or other reason).

hoy added inline comments.Jun 15 2021, 9:52 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
554

I can remove the dangling term now, but feel that we need another term for the missing probes, or to mention why zero is reported. Maybe something like "Reporting zero for non-executed probes. The compiler will infer counts for deleted probes"?

wenlei added inline comments.Jun 15 2021, 10:11 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
554

Yeah something like "Explicitly assign zero count for remaining probes without sample hits. This is to differentiate from removed probes whose count is unknown and will be inferred in the compiler". Missing probe is probe optimized away, it should be intuitive. Not sure if we need a special term for that..

One thing is the removal of the term about dangling; the other is we don't need to explain all the benefit when the base will be gone.

hoy edited the summary of this revision. (Show Details)Jun 15 2021, 10:54 PM
hoy updated this revision to Diff 352342.Jun 15 2021, 10:55 PM

Updating summary and comments.

wenlei accepted this revision.Jun 15 2021, 11:06 PM

Be consistent with dwarf line number based profile.

This is not accurate, see earlier comments. What i meant was not about "non-probe profile -> dwarf line number profile".

Otherwise lgtm. Thanks.

This revision is now accepted and ready to land.Jun 15 2021, 11:06 PM
hoy added a comment.Jun 15 2021, 11:20 PM

Be consistent with dwarf line number based profile.

This is not accurate, see earlier comments. What i meant was not about "non-probe profile -> dwarf line number profile".

Otherwise lgtm. Thanks.

It should be consistent with non-CS dwarf-based profile. CS dwarf-based profile doesn't have either zero or dangling reported, likely needs a fix. Is that what you meant?

Be consistent with dwarf line number based profile.

This is not accurate, see earlier comments. What i meant was not about "non-probe profile -> dwarf line number profile".

Otherwise lgtm. Thanks.

It should be consistent with non-CS dwarf-based profile. CS dwarf-based profile doesn't have either zero or dangling reported, likely needs a fix. Is that what you meant?

Non-cs dwarf-based profile normally doesn't have zero counts in the profile. Any count below a certain threshold is omitted in the profile to save size. This is the default behavior for our production usage (I believe for google as well). This is why the current change that needs explicit zero is not consistent with afdo.

hoy added a comment.Jun 15 2021, 11:38 PM

Be consistent with dwarf line number based profile.

This is not accurate, see earlier comments. What i meant was not about "non-probe profile -> dwarf line number profile".

Otherwise lgtm. Thanks.

It should be consistent with non-CS dwarf-based profile. CS dwarf-based profile doesn't have either zero or dangling reported, likely needs a fix. Is that what you meant?

Non-cs dwarf-based profile normally doesn't have zero counts in the profile. Any count below a certain threshold is omitted in the profile to save size. This is the default behavior for our production usage (I believe for google as well). This is why the current change that needs explicit zero is not consistent with afdo.

I see. I'm seeing zero counts in non-CS dwarf-based profile with our internal profile generation tool like below. I guess the production profile is generated with additional switches?

main:139344172:0
6: 0
9: 0
17: 0
18: 0
19: 0
20: 0
21: 0
23: 0
27: 0
28: 0
31: 0
32: 0
33: 0
34: 0
36: 0
41: 0
55: 0
64: 0
79: 0
84: 0
65411: 0
33: atoi:0

2: 0
65175: 0

39: read_min:27098

14: 0
17: 0
hoy edited the summary of this revision. (Show Details)Jun 15 2021, 11:51 PM
wlei added a comment.Jun 15 2021, 11:57 PM

Curious why we can remove the moveAndDanglePseudoProbes(don't need the concept of dangling)? My understanding is we treat dangling as unknown, at pass1 we need to pass this information to the profiled binary. An optimized-out probe won't have samples, it will be treated as zero if not marked as dangling.

hoy added a comment.Jun 16 2021, 9:20 AM

Curious why we can remove the moveAndDanglePseudoProbes(don't need the concept of dangling)? My understanding is we treat dangling as unknown, at pass1 we need to pass this information to the profiled binary. An optimized-out probe won't have samples, it will be treated as zero if not marked as dangling.

We now use "missing" to represent dangling which is natural, and "missing" is different from zero. The counts inferencer will reason about counts for missing probes. We will flip the concept of dangling to real dead, and likely make a routine like moveAndKillPseudoProbes.

wlei added a comment.Jun 16 2021, 9:46 AM

Curious why we can remove the moveAndDanglePseudoProbes(don't need the concept of dangling)? My understanding is we treat dangling as unknown, at pass1 we need to pass this information to the profiled binary. An optimized-out probe won't have samples, it will be treated as zero if not marked as dangling.

We now use "missing" to represent dangling which is natural, and "missing" is different from zero. The counts inferencer will reason about counts for missing probes. We will flip the concept of dangling to real dead, and likely make a routine like moveAndKillPseudoProbes.

I see, thanks for the clarification!

wlei accepted this revision.Jun 16 2021, 9:47 AM

LGTM, thanks!

hoy updated this revision to Diff 352469.Jun 16 2021, 10:05 AM

Fixing clang-tidy issue.

wmi added a comment.Jun 16 2021, 10:57 AM

Be consistent with dwarf line number based profile.

This is not accurate, see earlier comments. What i meant was not about "non-probe profile -> dwarf line number profile".

Otherwise lgtm. Thanks.

It should be consistent with non-CS dwarf-based profile. CS dwarf-based profile doesn't have either zero or dangling reported, likely needs a fix. Is that what you meant?

Non-cs dwarf-based profile normally doesn't have zero counts in the profile. Any count below a certain threshold is omitted in the profile to save size. This is the default behavior for our production usage (I believe for google as well). This is why the current change that needs explicit zero is not consistent with afdo.

Google keeps zero counts in the profile. It treats missing lines conservatively and tries to infer their hotness. I vaguely remember there is a flag added to control it.

wmi accepted this revision.Jun 16 2021, 11:19 AM

In "6. Be consistent with non-CS dwarf line number based profile", if you can mention how the compiler treat missing line number and missing pseudo in the description as a record, that will be helpful.

Google keeps zero counts in the profile. It treats missing lines conservatively and tries to infer their hotness. I vaguely remember there is a flag added to control it.

Thanks for pointing out. Actually Hongtao and I went back and checked our setup, we also keep lines with zero. I was confused with the threshold on total samples for function. Sorry about that.

In "6. Be consistent with non-CS dwarf line number based profile", if you can mention how the compiler treat missing line number and missing pseudo in the description as a record, that will be helpful.

+1.

hoy edited the summary of this revision. (Show Details)Jun 16 2021, 11:29 AM
This revision was landed with ongoing or failed builds.Jun 16 2021, 11:46 AM
This revision was automatically updated to reflect the committed changes.