Updates branch_weights annotation for call instructions during inlining.
ClosedPublic

Authored by danielcdh on Wed, Mar 8, 5:11 PM.

Details

Summary

Inliner should update the branch_weights annotation to scale it to proper value.

danielcdh created this revision.Wed, Mar 8, 5:11 PM
eraman requested changes to this revision.Thu, Mar 9, 2:06 PM
eraman added inline comments.
include/llvm/IR/Instruction.h
257

Do not use double here and instead specify it as a fraction.

lib/IR/Instruction.cpp
691

What are the assumptions on the MD_prof metadata here? Could you assert V here? If it is indeed possible that ith operand is not constant int, could the next operand be constant int (in which case you shouldn't return here)

lib/Transforms/Utils/InlineFunction.cpp
1428–1454

Document this method. Also, perhaps choose a more meaningful name.

1443

No need to keep track of the cloned blocks and update the calls inside them. VMap also includes the instructions - so just check if they are calls and update the weights.

1454

There is a subtle issue here. Let's say there is a block in the callee that is not executed in this call context and that block has a call instruction. You'll still be reducing the weight of that call even though you haven't cloned it. Similar issue does exist in updating the bfi incrementally, but that's okay because BFI does get recomputed later. In this case, once the weight gets incorrectly updated it remains so forever. What you should be doing is update the weights only when the call has been cloned.

1470

This code that gets either the metadata weight or the block weight should be refactored and moved out of here.

1477

This gets confusing. You're now updating the entry count based on the metadata. When sample profile is used, is it expected that the sum of the calls' profile weights equal the entry count? Do you smooth out the entry count based on call instructions' profile weight when you initially load the profile. In any case, more comments are helpful.

test/Transforms/Inline/prof-update.ll
2

Please expand this test or add a new test case to handle more complex cases where some calls are not cloned into the caller.

This revision now requires changes to proceed.Thu, Mar 9, 2:06 PM
danielcdh updated this revision to Diff 91454.Fri, Mar 10, 10:13 PM
danielcdh marked 8 inline comments as done.

update

danielcdh added inline comments.Fri, Mar 10, 10:13 PM
lib/IR/Instruction.cpp
691

removed the check.

lib/Transforms/Utils/InlineFunction.cpp
1477

Code updated. Yes, the sample profile collection makes sure the function entry count meet with all call edges.

Let me know if you think there still need to add comments.

eraman added inline comments.Mon, Mar 13, 5:30 PM
lib/IR/Instruction.cpp
693

For counts from instrumentation based profile, I use a 128 bit APInt to do the arithmetic and then educe it to 64 bit. In this case, since the values being multiplied are sample profile weights, I suppose the likelihood of Val * S overflowing 64 bits is very small and the APInt may be unnecessary. Even then, use saturated multiply so that the values are at least sensible even in the unlikely case of overflow.

lib/Transforms/Utils/InlineFunction.cpp
1441

You have to check if Entry.second is a call. It is possible for call instructions to be simplified to a value during the cloning.

1447

Useful to add a comment as to why this is done only when the instruction is in VMap.

test/Transforms/Inline/prof-update.ll
2

Nit: I prefer writing comments to explain why I expect the values in the CHECK statements. This helps anyone trying to modify the test later.

danielcdh updated this revision to Diff 91663.Mon, Mar 13, 8:04 PM
danielcdh marked 4 inline comments as done.

update

eraman added inline comments.Thu, Mar 16, 4:57 PM
lib/IR/Instruction.cpp
694

Perhaps it is not clear above, but SaturatingMultiply may be sufficient in your case. I'm ok with using APInt as well as in the most common case (multiplication result fits within 64 bits), the division is not expensive. Perhaps add a comment above the udiv stating that this could potentially be expensive, but most likely the product is going to fit within 64 bits.

lib/Transforms/Utils/InlineFunction.cpp
1450

The check above is not needed. After all, if the block is reachable in the call context, all instructions inside are reachable too. But there is one tricky case. Let's say that a call has been simplified into a value (this can happen during pruning), Should you update the call count or not? If you should not update, then the check above should be isa<CallInst>(VMap.Count(CI)). I think updating is the right thing here though.

zzheng added a subscriber: zzheng.Thu, Mar 16, 5:45 PM

If X is Optional, please use X.has_value() to check if it has value and X.getValue() to check for 0 value.

lib/Transforms/Utils/InlineFunction.cpp
1433

Do you mean to use !CalleeEntryCount.has_value()?

1437

I'm confused, do you mean to test if CallSiteCount has no value or it has a value of 0?

1441

Isn't it redundant to check ptr != nullptr?

Is there any chance that Entry.second is NULL and what happen on (*Entry.second) in that case?

danielcdh updated this revision to Diff 92152.Fri, Mar 17, 9:29 AM
danielcdh marked 4 inline comments as done.

update

lib/Transforms/Utils/InlineFunction.cpp
1441

It indeed happened once when I compile large code, that's why I added this check. I'm not familiar with cloning code, Easwaran may have some insights on why this could be nullptr? Or it's a bug in function clone?

1450

The prof count only attaches to branch instructions. For the case you mentioned, it's already checked by the "if (CallInst *CI = dyn_cast<CallInst>(&I))", thus I simply removed the second check.

a couple more nitpicks... please check all use of Optional variables

lib/Transforms/Utils/InlineFunction.cpp
1466

!CalleeCount.hasValue()

1470

!CallCount.hasValue()

danielcdh updated this revision to Diff 92170.Fri, Mar 17, 11:04 AM
danielcdh marked 2 inline comments as done.

update

eraman accepted this revision.Fri, Mar 17, 11:55 AM

LGTM

lib/Transforms/Utils/InlineFunction.cpp
1441

The value type of the VMap is WeakVH and the value it holds becomes null when it gets destroyed.

This revision is now accepted and ready to land.Fri, Mar 17, 11:55 AM
danielcdh closed this revision.Mon, Mar 20, 9:52 AM