Page MenuHomePhabricator

[llvm][misexpect] Add tests for sample profiling
Needs ReviewPublic

Authored by paulkirth on Apr 26 2022, 2:49 PM.

Details

Reviewers
tejohnson
Summary

MisExpect was occasionally crashing under SampleProfiling, due to a division by zero.
We worked around that in D124302 by changing the assert to an early return.
This patch is intended to add a test case for the crashing scenario and
re-enable MisExpect for SampleProfiling.

Diff Detail

Event Timeline

paulkirth created this revision.Apr 26 2022, 2:49 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2022, 2:49 PM
paulkirth updated this revision to Diff 431527.May 23 2022, 5:04 PM

use smaller test-case for crashing input.

As a general note, the test here will fail by design with the assertion in MisExpect.cpp re-enabled. That is the case I'm trying to fix/understand and address. We should probably still land the test, but the assertion will have to be removed unless we figure out a way to address the underlying issues.

paulkirth published this revision for review.May 23 2022, 5:36 PM
paulkirth added a reviewer: tejohnson.

@tejohnson I have a test case that encapsulates when SampleProfiling crashes we saw w/ misexpect a while ago.

What I've been trying to figure out is how a profile ever gets zero weights like this.

My understanding of the PGO pipeline is that there are 3 modes where profiling weights are added: Front End Instrumentation inserted by clang, IR profiles, and SampleProfiling. My understanding was also that these modes were incompatible with one another, so you wouldn't have weights added by multiple sources, e.g., IR and Sample Profiling.

Given that model, I designed MissExpect to infer based on the profiling mode, if an existing weight should be interpreted as added by an llvm.expect intrinsic or not. Both sample profiles, and IR profiles occur after llvm.expect intrinsics are lowered, so they assume any existing weights are from annotations. When using Clang based instrumentation we assume that all existing weights come from profiles and we instead do the check when llvm.expect intrinsics are lowered. Given that model, I'm not sure how a program ever gets a set of branch weights that are all zeros.

Our checks for MisExpect in Sample profiles examine the branch weights on an instruction just prior to the profiling weight being added. There shouldn't be an opportunity for branch weights to be added to the instruction prior to this point, except from intrinsic lowering. We know that expect intrinsics won't do that, so we can safely ignore that possibility.

The only other scenario I can think of is that profiles are being mixed (e.g. Sample + IR) or that old bitcode with existing weights is being compiled again. If either case is legal then I we may need to consider a new design that can handle those scenarios.

Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 5:37 PM
paulkirth edited the summary of this revision. (Show Details)May 23 2022, 5:41 PM

@tejohnson I have a test case that encapsulates when SampleProfiling crashes we saw w/ misexpect a while ago.

Won't this patch if committed trigger the failure again?

What I've been trying to figure out is how a profile ever gets zero weights like this.

Can you use the original failing test case to dig into this more? Looks like it was XFDO from the internal Google bug. Reach out to me internally if you need help reproducing that.

My understanding of the PGO pipeline is that there are 3 modes where profiling weights are added: Front End Instrumentation inserted by clang, IR profiles, and SampleProfiling. My understanding was also that these modes were incompatible with one another, so you wouldn't have weights added by multiple sources, e.g., IR and Sample Profiling.

I believe it is possible to combine IR and SamplePGO profiles, but it doesn't look like that is the case in the failing test case.

Won't this patch if committed trigger the failure again?

Yes, it will. I'm still trying to figure the situation out, but I wanted to create a case that triggers the crash with the assert, and determine if we can fix the underlying problem before going back to early return.

Can you use the original failing test case to dig into this more? Looks like it was XFDO from the internal Google bug. Reach out to me internally if you need help reproducing that.

I used that case to determine that the underlying issue is that branch weights are added that have 0 total weight across all branches. I even ran llvm-reduce over the internal example, and it came out remarkably similar to the misexpect-zero.ll test.

I believe it is possible to combine IR and SamplePGO profiles, but it doesn't look like that is the case in the failing test case.

I went back and reviewed the code in SampleProfile, and I think I understand the source of the bug, but I'm not sure exactly how to solve it completely.

When I reimplemented MisExpect, I failed to notice that part of SampleProfiling behaves very differently than when I first implemented it in 2019.

The comment in SampleProfile.cpp just below the call into misexpect illustrates the problem:

// OverwriteExistingWeights. In ThinLTO, the profile annotation is done
// twice. If the first annotation already set the weights, the second pass
// does not need to set it. With OverwriteExistingWeights, Blocks with zero
// weight should have their existing metadata (possibly annotated by LTO
// prelink) cleared.

Looking at the blame, this was changed back in 2021, and I missed that difference when I started working on this again. Previously there was no mention of ThinLTO here, and I failed to account for that scenario and the fact that an existing branch weight may have an origin other that form a lowered llvm.expect intrinsic.

An option to work around this is to only do MisExpect checking when MaxWeight > 0, which will ensure that at least one branch weight will be non-zero.

It doesn't solve the issue that there are times when a pre-existing weight doesn't imply that it comes from llvm.expect, and I really don't have a good answer to how we can address that (and also the fact that you can mix IR and sample profiling).

One option would be to augment branch weight metadata to identify how it was added: via profile or annotation. I'm not convinced that's a great idea, since it would require updates to all the parts of llvm that add branch weights., and increases the size of metadata for a single diagnostic.

So after looking at this for a while, I think the core issue is that MisExpect needs some kind of provenance information about whether or not the branch weight originates from an llvm.expect intrinsic. If we could know that, then MisExpect diagnostics should be correct even if you run several profiling passes or combine different types of profiling, since we would only report issues on branch weights that had the correct provenance.

I see two basic ways to do that:

  1. Add a new field to MD_prof that carries the provenance
  2. Add an external piece of metadata that describes the same thing

I think the best option is to simply add that information into the branch weight metadata directly, so add an extra field that is an enum or another piece of metadata that describes whether or not it comes from an expect intrinsic. That would avoid inventing any additional machinery to correctly update the branch weight and also some additional metadata that may be there.

Changing the layout of MD_prof seems to be problematic though, especially given that we need to maintain backwards compatibility. I think the normal way to transition these changes is to introduce a replacement metadata and use that everywhere instead of the old version. the bitcode reader can then automatically update the old MD -> the new one. This seems like it may be a lot to do, given that all LLVM tests that have branch weights would need to be updated (at least the FileCheck portions if nothing else). That can probably be done w/ a good regex, but its a large invasive change which is enough to make me hesitant.

@tejohnson what are your thoughts on this approach? Do you think this would be a good direction to pursue? Are there other factors that I've failed to consider?

paulkirth updated this revision to Diff 450409.Fri, Aug 5, 2:33 PM

Rebase and restore assertion.

A follow up patch will try to address the problem of provenance tracking.

paulkirth updated this revision to Diff 450449.Fri, Aug 5, 4:42 PM

Remove FIXME, since we're re-enabling the call to checkExpectAnnotations