This is an archive of the discontinued LLVM Phabricator instance.

Fine tuning of sample profile propagation algorithm.
ClosedPublic

Authored by danielcdh on Aug 5 2016, 1:29 PM.

Diff Detail

Event Timeline

danielcdh updated this revision to Diff 67009.Aug 5 2016, 1:29 PM
danielcdh retitled this revision from to Fine tuning of sample profile propagation algorithm..
danielcdh updated this object.
danielcdh added reviewers: dnovillo, davidxl.
danielcdh added a subscriber: llvm-commits.
dnovillo edited edge metadata.Aug 8 2016, 8:41 AM

Could you describe this in *much* more detail?

  • What changes?
  • What's the idea behind the change?
  • High-level overview of the change.

It's really hard to review dry-code without the motivation behind the changes and a description.

The changes includes:

  • Not use branch instruction and intrinsic instruction for annotation because their debug info is usually incorrect.
  • Hoist the check of whether a callsite was inlined in the profile but not in annotation. Because we are confident that these callsites are cold.
  • If any BB in an equivalence class is annotated, set the entire equivalence class as annotated.
  • When setting the wait, add the weight by one to ensure all weights are at least 1. This is trying to avoid propagation error when the weight is 0.
  • Add a propagateThroughEdge pass with all edge weights reset to 0 to recompute edge weight from propagated block weights.
  • Add a propagateThroughEdge pass in which basic block count can amended when it's apparently incorrect.
  • If computed unknown edge weight exceeds its pred/succ block's weight, reduce its weight to pred/succ block's weight
  • When block count is 0, all its pred/succ edge count is set to 0
  • Adjust block header weights to be no less than all basic blocks inside the loop

These changes are all from tuning google applications for a long period of time, so it will be hard to get unittests to show effects for each one of them. But they combined to show big difference: for the legacy unittests, the changes have corrected some incorrect annotations.

Apologies for the delay, Dehao. Thanks for the description. That needs to be added to the code, so we have it for future reference. I suspect that several of the spots where I was confused and asked for comments are going to be good candidates to sprinkle that high-level description around.

Thanks.

lib/Transforms/IPO/SampleProfile.cpp
463–466

Why are we ignoring branches? IS it because they don't matter for weight calculation? Should we return 0 for them, or simply ignore them?

474

And this is because we will redo that inlining decision later, right? What about the inlining decisions we decide not to redo?

810

s/could/should/ here, right?

907

Convention is camel case for locals => OtherEC

921

Comment here, please. What is this doing?

1014

Comment here, please. What is this doing?

1039

The block below also needs some commenting. Why the multiple stages of propagation? Can this be re-factored a bit?

danielcdh updated this revision to Diff 67792.Aug 11 2016, 8:19 PM
danielcdh marked 4 inline comments as done.
danielcdh edited edge metadata.

add more comments

lib/Transforms/IPO/SampleProfile.cpp
463–466

Because branch instruction's debug info are usually attributed to sources outside the basic block. So we simply ignore all branches when annotating.

474

At this point, all necessary inlining has been redone. If we decided not to redo it, it means it's hot, then we will mark it's count as 0 to prevent from getting inlined in later inlining phase.

1039

Comments added. I think this just calls the same function 3 times, not sure if using a loop to do it three times would simplify the code much.

dnovillo accepted this revision.Aug 12 2016, 6:53 AM
dnovillo edited edge metadata.

Thanks! LGTM.

This revision is now accepted and ready to land.Aug 12 2016, 6:53 AM
danielcdh closed this revision.Aug 12 2016, 9:30 AM