Page MenuHomePhabricator

[Profile] backward propagate profile data in jump-threading
AcceptedPublic

Authored by davidxl on Aug 17 2017, 11:15 PM.

Details

Reviewers
chandlerc
Summary

If a branch condition is defined by a function call result, the profile information for the branch may be (effectively) lost after inlinling.

For instance,

 bool cond() {
      if (a)
        return true;
      if (b)
        return true;
     return false;
 }

void test() {
    if (__builtin_expect(cond(), false) ) {
         bar();  // cold
     }
   ..
 }

After inlining and jump threading, the first branch in the inline instance does not have profile data -- this will make the block for bar() look hot.

In this patch, the problem is fixed in jump threading by 'backward' propagate the BP to the predecessor given the conditional probability discovered.

Diff Detail

Event Timeline

davidxl created this revision.Aug 17 2017, 11:15 PM
chandlerc edited edge metadata.Aug 18 2017, 1:40 AM

Really, really nice find here. A bunch of comments below, but they're all pretty minor on the whole.

lib/Transforms/Scalar/JumpThreading.cpp
159

Would be good to connect this comment to the arguments and behavior of this function.

(Don't get me wrong, this comment is *excellent* for explaining why we can do this back propagation and how to compute it! I just also want comments that explain how this routine fits into the model.)

160

Update -> update

175

You can use return {IncomingBB, PhiBB}; here and elsewhere to return. You'll have to specify the return type in the lambda, but then you won't even need to do that for nullptrs...

176–177

Why just one? You could do:

auto *PredBB = IncomingBB;
while (auto *SinglePredBB = PredBB->getSinglePredecessor())

PredBB = SinglePredBB;

at the top fo the function, and have a single set of logic to handle the whole thing?

189–192

Can simplify as:

if (!CI || !CI->getType()->isIntegerTy(1))
  continue;
194

This code only handle i1 PHIs that directly feed conditional branches, right?

Provided that's so, I would assert that CI is a 1 bit integer here, and then test whether it the null value or not rather than zexting and extracting it.

200

No need to compare with nullptr, pointers are testable directly:

if (PredOutEdge.first)
207

"we" -> "We"

I wonder, since what we have is fundamentally an upper bound, would it make sense to go ahead and apply this even when there is existing profile info?

Even for static hints, I can see this happening. For example, the pre-inlining code might have some relatively weak hint that is, after inlining, bounded by a powerful hint.

Or is your concern doing this without measuring the impact for PGO? If that's why, I would make that more clear -- we *intend* to always use this to refine weights, but PGO requires more careful tuning to make sure this doesn't regress anything.

Otherwise, I'd just go ahead and implement this as clamping any existing weights.

208–209

"With PGO, this can be turned on too to refine the profile with" -> "With PGO, this can be used to refine even existing profile data with"

213

I would remind the reader that this is because all we have is an upper bound.

davidxl marked 6 inline comments as done.Aug 18 2017, 10:32 AM
davidxl added inline comments.
lib/Transforms/Scalar/JumpThreading.cpp
159

Annotated the comments with variable names in the code.

194

There is an early check of one bit type. I am not clear what is the suggestion about (regarding extracting value).

207

Yes, the latter . I have seen issues with PGO so without more testing, I will leave this for nonPGO case for now.

davidxl updated this revision to Diff 111700.Aug 18 2017, 10:33 AM

Addressed Chandler's review feedbacks

chandlerc accepted this revision.Aug 18 2017, 2:50 PM

Awesome. LGTM with the minor things below addressed. Let me know if any of this isn't make sense.

lib/Transforms/Scalar/JumpThreading.cpp
163

nit: spurious blank line...

176

nit: clang-format to fix line length

194

Essentially...

assert(CI.getBitWidth() == 1 && "We only handle i1 constants here.");
BP = CI->isOne() ? ... : ...;

You could also use CI->isZero(), but one seems more clear.

208–209

This wording improvement for the comment would still be helpful I think.

This revision is now accepted and ready to land.Aug 18 2017, 2:50 PM
davidxl updated this revision to Diff 112064.Aug 21 2017, 2:59 PM

Fix an obvious bug.

Also improved test cases to cover the path that exposed the bug.

chandlerc accepted this revision.Aug 21 2017, 8:00 PM

Cool! With the changes below, LGTM.

Any further testing you think this needs before relanding it? (I don't really have good ideas other than to patch it in internally and build some large things with PGO and ThinLTO)

lib/Transforms/Scalar/JumpThreading.cpp
184–188

I think this will be easier to read inverted:

auto *SinglePredBB = PredBB->getSinglePredecessor();
if (!SinglePredBB)
  return {nullptr, nullptr};

SuccBB = PredBB;
PredBB = SinglePredBB;
189

nit: When infinitely looping, I find for (;;) { a shorter and still easy to read idiom.

190–191

No need for this once the loop above returns directly when exiting.

davidxl updated this revision to Diff 112111.Aug 21 2017, 9:40 PM

Addressed review feedback.

Tested with clang self build with PGO.