This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO][llvm-profgen] Always report dangling probes for frames with real samples.
ClosedPublic

Authored by hoy on Apr 9 2021, 3:43 PM.

Details

Summary

Report dangling probes for frames that have real samples collected. Dangling probes are the probes associated to an empty block. When reported, sample count on a dangling probe will not be trusted by the compiler and we will rely on the counts inference algorithm to get the probe a reasonable count. This actually fixes a bug where previously only those dangling probes with samples collected were reported.

This patch also fixes two existing issues. Pseudo probes are stored in Address2ProbesMap and their pointers are used in PseudoProbeInlineTree. Previously std::vector was used to store probes and the pointers to probes may get obsolete as the vector grows. I'm changing std::vector to std::list instead.

The other issue is that all outlined functions shared the same inline frame previously due to the unchanged Index value as the dummy inlineSite identifier.

Good results seen for SPEC2017 in general regarding profile quality.

Diff Detail

Event Timeline

hoy created this revision.Apr 9 2021, 3:43 PM
hoy requested review of this revision.Apr 9 2021, 3:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 9 2021, 3:43 PM
hoy updated this revision to Diff 336574.Apr 9 2021, 3:45 PM

Updating D100235: [CSSPGO][llvm-profgen] Fixing an obselete iterator issue.

Another issue being fixed is that all outlined functions shared the same inline frame previously due to the unchanged Index value as the dummy inlineSite identifier.

Why is this an issue? What problem does this cause? Conceptually, if we want to give them a tree structure, top level inliner are indeed children of dummy root.

They could be used to visit all probes that are associated to an inline frame, which is something else I'll be working on.

We do have containers that can be re-allocated for growth, and doing that itself isn't an issue, unless it breaks specific use case. With what we have now we should not store pointers to inlinee's probe, and we're indeed doing that right now. So there's not bug, right?

If there's a future change that require changing this model (of not keeping probe pointers), perhaps make the vector->list change together with the change that has the new use case?

llvm/tools/llvm-profgen/PseudoProbe.cpp
217

nit: outline function -> top level inliner.

llvm/tools/llvm-profgen/PseudoProbe.h
75

Can we spell out the full return type? It seems we would use typedef if needed, but not so for auto return type.

hoy added a comment.Apr 11 2021, 11:36 PM

Another issue being fixed is that all outlined functions shared the same inline frame previously due to the unchanged Index value as the dummy inlineSite identifier.

Why is this an issue? What problem does this cause? Conceptually, if we want to give them a tree structure, top level inliner are indeed children of dummy root.

Yes, top level inliner is still children of dummy root with this change. However, they do not share dummy callsite id any more. Note that without this change, the dummy callsite id for top-level inliner (Index below) isn't updated correctly.

if (Root != Cur) {
   Index = readUnsignedNumber<uint32_t>();
 }

They could be used to visit all probes that are associated to an inline frame, which is something else I'll be working on.

We do have containers that can be re-allocated for growth, and doing that itself isn't an issue, unless it breaks specific use case. With what we have now we should not store pointers to inlinee's probe, and we're indeed doing that right now. So there's not bug, right?

If there's a future change that require changing this model (of not keeping probe pointers), perhaps make the vector->list change together with the change that has the new use case?

It doesn't break anything so far since PseudoProbeInlineTree.ProbeVector is not used anywhere. It might also be good to remove it for now since the container might end up with invalid pointers.

Probe pointers isn't an issue as long as they are valid. I'm using std::list instead of std::vector to store real probe objects so that taking address of them is never invalidated.

I'm actually working on a feature that reports all dangling probes in an inline frame into a profile, even if there are no samples collected for them. As of now only dangling probes with samples are reported. Since samples collected on a dangling probe doesn't make sense (dangling probes can be placed anywhere), they should always be reported so that the counts inferencer never overlooks them. It makes sense to include the current patch into that work.

Another issue being fixed is that all outlined functions shared the same inline frame previously due to the unchanged Index value as the dummy inlineSite identifier.

Why is this an issue? What problem does this cause? Conceptually, if we want to give them a tree structure, top level inliner are indeed children of dummy root.

Yes, top level inliner is still children of dummy root with this change. However, they do not share dummy callsite id any more. Note that without this change, the dummy callsite id for top-level inliner (Index below) isn't updated correctly.

if (Root != Cur) {
   Index = readUnsignedNumber<uint32_t>();
 }

Yeah, index would always be zero. But what problem does this cause? It seems that the call site id for top-level inliner is a dummy field anyways, so value should be ignored. Currently it works just fine, right? Is it also motivated by the upcoming change to report dangling probes with counts (so you need to different call site under dummy root)? If so, it might make sense to include it there as well..

They could be used to visit all probes that are associated to an inline frame, which is something else I'll be working on.

We do have containers that can be re-allocated for growth, and doing that itself isn't an issue, unless it breaks specific use case. With what we have now we should not store pointers to inlinee's probe, and we're indeed doing that right now. So there's not bug, right?

If there's a future change that require changing this model (of not keeping probe pointers), perhaps make the vector->list change together with the change that has the new use case?

It doesn't break anything so far since PseudoProbeInlineTree.ProbeVector is not used anywhere. It might also be good to remove it for now since the container might end up with invalid pointers.

Probe pointers isn't an issue as long as they are valid. I'm using std::list instead of std::vector to store real probe objects so that taking address of them is never invalidated.

I'm actually working on a feature that reports all dangling probes in an inline frame into a profile, even if there are no samples collected for them. As of now only dangling probes with samples are reported. Since samples collected on a dangling probe doesn't make sense (dangling probes can be placed anywhere), they should always be reported so that the counts inferencer never overlooks them. It makes sense to include the current patch into that work.

For dangling probe with counts, we currently use the special value (indicating dangling) instead of the actual count for output, right? What is the new behavior with upcoming change?

hoy added a comment.Apr 12 2021, 10:12 AM

Another issue being fixed is that all outlined functions shared the same inline frame previously due to the unchanged Index value as the dummy inlineSite identifier.

Why is this an issue? What problem does this cause? Conceptually, if we want to give them a tree structure, top level inliner are indeed children of dummy root.

Yes, top level inliner is still children of dummy root with this change. However, they do not share dummy callsite id any more. Note that without this change, the dummy callsite id for top-level inliner (Index below) isn't updated correctly.

if (Root != Cur) {
   Index = readUnsignedNumber<uint32_t>();
 }

Yeah, index would always be zero. But what problem does this cause? It seems that the call site id for top-level inliner is a dummy field anyways, so value should be ignored. Currently it works just fine, right? Is it also motivated by the upcoming change to report dangling probes with counts (so you need to different call site under dummy root)? If so, it might make sense to include it there as well..

The index starts from zero, and could be updated by parsing an inlinee. And when it comes back to the next top-level inliner, the value index is the leftover from the previous inlinee. This causes top-level inliner frames non-deterministically shared. The fix is motivated by upcoming dangling probe work but I feel that it may also affect inliner context building for shared top-level frames. I'll see if I can find a case for that. Otherwise I'll include this fix in the dangling probe patch as well.

They could be used to visit all probes that are associated to an inline frame, which is something else I'll be working on.

We do have containers that can be re-allocated for growth, and doing that itself isn't an issue, unless it breaks specific use case. With what we have now we should not store pointers to inlinee's probe, and we're indeed doing that right now. So there's not bug, right?

If there's a future change that require changing this model (of not keeping probe pointers), perhaps make the vector->list change together with the change that has the new use case?

It doesn't break anything so far since PseudoProbeInlineTree.ProbeVector is not used anywhere. It might also be good to remove it for now since the container might end up with invalid pointers.

Probe pointers isn't an issue as long as they are valid. I'm using std::list instead of std::vector to store real probe objects so that taking address of them is never invalidated.

I'm actually working on a feature that reports all dangling probes in an inline frame into a profile, even if there are no samples collected for them. As of now only dangling probes with samples are reported. Since samples collected on a dangling probe doesn't make sense (dangling probes can be placed anywhere), they should always be reported so that the counts inferencer never overlooks them. It makes sense to include the current patch into that work.

For dangling probe with counts, we currently use the special value (indicating dangling) instead of the actual count for output, right? What is the new behavior with upcoming change?

The new behavior lets you always report dangling probe regardless of whether there're samples collected for them. Dangling probes may be placed anywhere and if they are accidentally placed next to an instruction not executed, currently the dangling probe will not be reported. And when it comes to the counts inferencer, it won't know the probe is dangling. It'll treat the probe as never executed.

Another issue being fixed is that all outlined functions shared the same inline frame previously due to the unchanged Index value as the dummy inlineSite identifier.

Why is this an issue? What problem does this cause? Conceptually, if we want to give them a tree structure, top level inliner are indeed children of dummy root.

Yes, top level inliner is still children of dummy root with this change. However, they do not share dummy callsite id any more. Note that without this change, the dummy callsite id for top-level inliner (Index below) isn't updated correctly.

if (Root != Cur) {
   Index = readUnsignedNumber<uint32_t>();
 }

Yeah, index would always be zero. But what problem does this cause? It seems that the call site id for top-level inliner is a dummy field anyways, so value should be ignored. Currently it works just fine, right? Is it also motivated by the upcoming change to report dangling probes with counts (so you need to different call site under dummy root)? If so, it might make sense to include it there as well..

The index starts from zero, and could be updated by parsing an inlinee. And when it comes back to the next top-level inliner, the value index is the leftover from the previous inlinee. This causes top-level inliner frames non-deterministically shared. The fix is motivated by upcoming dangling probe work but I feel that it may also affect inliner context building for shared top-level frames. I'll see if I can find a case for that. Otherwise I'll include this fix in the dangling probe patch as well.

I see. If it's not always zero, then it's not good.. In that case, we could fix it here setting it to 0, or change it like what you have and group it with the upcoming change that relies on call site id being differentiator.

They could be used to visit all probes that are associated to an inline frame, which is something else I'll be working on.

We do have containers that can be re-allocated for growth, and doing that itself isn't an issue, unless it breaks specific use case. With what we have now we should not store pointers to inlinee's probe, and we're indeed doing that right now. So there's not bug, right?

If there's a future change that require changing this model (of not keeping probe pointers), perhaps make the vector->list change together with the change that has the new use case?

It doesn't break anything so far since PseudoProbeInlineTree.ProbeVector is not used anywhere. It might also be good to remove it for now since the container might end up with invalid pointers.

Probe pointers isn't an issue as long as they are valid. I'm using std::list instead of std::vector to store real probe objects so that taking address of them is never invalidated.

I'm actually working on a feature that reports all dangling probes in an inline frame into a profile, even if there are no samples collected for them. As of now only dangling probes with samples are reported. Since samples collected on a dangling probe doesn't make sense (dangling probes can be placed anywhere), they should always be reported so that the counts inferencer never overlooks them. It makes sense to include the current patch into that work.

For dangling probe with counts, we currently use the special value (indicating dangling) instead of the actual count for output, right? What is the new behavior with upcoming change?

The new behavior lets you always report dangling probe regardless of whether there're samples collected for them. Dangling probes may be placed anywhere and if they are accidentally placed next to an instruction not executed, currently the dangling probe will not be reported. And when it comes to the counts inferencer, it won't know the probe is dangling. It'll treat the probe as never executed.

Makes sense, thanks for clarification.

hoy retitled this revision from [CSSPGO][llvm-profgen] Fixing an obselete iterator issue. to [CSSPGO][llvm-profgen] Always report dangling probes for frames with real samples..Apr 19 2021, 7:06 PM
hoy edited the summary of this revision. (Show Details)
hoy updated this revision to Diff 338682.Apr 19 2021, 7:08 PM
hoy edited the summary of this revision. (Show Details)

Updating D100235: [CSSPGO][llvm-profgen] Always report dangling probes for frames with real samples.

hoy updated this revision to Diff 338953.Apr 20 2021, 12:06 PM

Updating D100235: [CSSPGO][llvm-profgen] Always report dangling probes for frames with real samples.

llvm/tools/llvm-profgen/PseudoProbe.cpp
217

Fixed.

llvm/tools/llvm-profgen/PseudoProbe.h
75

Sounds good.

wlei added inline comments.Apr 20 2021, 11:22 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
518

Try to understand this.

If pass2's probe is dangling, then return 0 no matter what in the profile(pass1);

If pass2's probe is not dangling and reading from profile gets a dangling probe(pass1), then return invalid count(UINT64_MAX), (this case happen because later the BB of the probe will be removed)

is that right?

So this change make sure all the future removed BB will be identified as dangling so that the inference algorithm can work on this?

is this issue also coming from the rebalancing algorithm?

like we have TrueBranch :0 and FalseBranch: 0, the rebalancing algorithm can't work on this.

but if we have TrueBranch:0 and FalseBranch:dangling, then it can be rebalanced.

wenlei accepted this revision.Apr 21 2021, 8:56 AM

lgtm, thanks!

llvm/tools/llvm-profgen/ProfileGenerator.cpp
518

By return, you meant the return value of SampleProfileLoader::getProbeWeight?

For inference, I think what matters is whether probe is dangling from pass1.

llvm/tools/llvm-profgen/PseudoProbe.h
48–51

nit: a more reflective type name, something like InlinedProbesMap, InlinedProbeTreeMap?

This revision is now accepted and ready to land.Apr 21 2021, 8:56 AM
hoy added inline comments.Apr 21 2021, 9:50 AM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
518

Here is only about pass1. llvm-profgen runs on pass1 binary and perf data. A dangling probe here is from pass1.

When the dangling-ness is imported in pass2, the compiler has to guess its sample count instead of trusting what's collected in pass1. When the same probe is seen also dangling in pass2 compilation, it indicates that the code being probed is removed by pass2, therefore, no matter what samples pass1 collected for it, it shouldn't matter since pass2 is no longer interested in the probe.

The issue was noticed when a dangling probe was not reported dangling, thus the counts inferencer trust its zero sample count instead of inferring a count for it. Though I didn't see it directly related to rebalancing, I believe it will fix your example case.

llvm/tools/llvm-profgen/PseudoProbe.h
48–51

InlinedProbeTreeMap sounds good.

hoy updated this revision to Diff 339285.Apr 21 2021, 9:52 AM

Renaming ChildrenType to InlinedProbeTreeMap.

wlei accepted this revision.Apr 21 2021, 10:54 AM

LGTM, thanks for the fix.

llvm/tools/llvm-profgen/ProfileGenerator.cpp
518

By return, you meant the return value of SampleProfileLoader::getProbeWeight?

Yes, I mean when get the value from pass2.

The issue was noticed when a dangling probe was not reported dangling, thus the counts inferencer trust its zero sample count instead of inferring a count for it. Though I didn't see it directly related to rebalancing, I believe it will fix your example case.

I see, thanks for the helpful explanation.

This revision was landed with ongoing or failed builds.Apr 21 2021, 6:08 PM
This revision was automatically updated to reflect the committed changes.