Inliner should update the branch_weights annotation to scale it to proper value.
Do not use double here and instead specify it as a fraction.
What are the assumptions on the MD_prof metadata here? Could you assert V here? If it is indeed possible that ith operand is not constant int, could the next operand be constant int (in which case you shouldn't return here)
Document this method. Also, perhaps choose a more meaningful name.
No need to keep track of the cloned blocks and update the calls inside them. VMap also includes the instructions - so just check if they are calls and update the weights.
There is a subtle issue here. Let's say there is a block in the callee that is not executed in this call context and that block has a call instruction. You'll still be reducing the weight of that call even though you haven't cloned it. Similar issue does exist in updating the bfi incrementally, but that's okay because BFI does get recomputed later. In this case, once the weight gets incorrectly updated it remains so forever. What you should be doing is update the weights only when the call has been cloned.
This code that gets either the metadata weight or the block weight should be refactored and moved out of here.
This gets confusing. You're now updating the entry count based on the metadata. When sample profile is used, is it expected that the sum of the calls' profile weights equal the entry count? Do you smooth out the entry count based on call instructions' profile weight when you initially load the profile. In any case, more comments are helpful.
Please expand this test or add a new test case to handle more complex cases where some calls are not cloned into the caller.
For counts from instrumentation based profile, I use a 128 bit APInt to do the arithmetic and then educe it to 64 bit. In this case, since the values being multiplied are sample profile weights, I suppose the likelihood of Val * S overflowing 64 bits is very small and the APInt may be unnecessary. Even then, use saturated multiply so that the values are at least sensible even in the unlikely case of overflow.
You have to check if Entry.second is a call. It is possible for call instructions to be simplified to a value during the cloning.
Useful to add a comment as to why this is done only when the instruction is in VMap.
Nit: I prefer writing comments to explain why I expect the values in the CHECK statements. This helps anyone trying to modify the test later.
Perhaps it is not clear above, but SaturatingMultiply may be sufficient in your case. I'm ok with using APInt as well as in the most common case (multiplication result fits within 64 bits), the division is not expensive. Perhaps add a comment above the udiv stating that this could potentially be expensive, but most likely the product is going to fit within 64 bits.
The check above is not needed. After all, if the block is reachable in the call context, all instructions inside are reachable too. But there is one tricky case. Let's say that a call has been simplified into a value (this can happen during pruning), Should you update the call count or not? If you should not update, then the check above should be isa<CallInst>(VMap.Count(CI)). I think updating is the right thing here though.
If X is Optional, please use X.has_value() to check if it has value and X.getValue() to check for 0 value.
Do you mean to use !CalleeEntryCount.has_value()?
I'm confused, do you mean to test if CallSiteCount has no value or it has a value of 0?
Isn't it redundant to check ptr != nullptr?
Is there any chance that Entry.second is NULL and what happen on (*Entry.second) in that case?
It indeed happened once when I compile large code, that's why I added this check. I'm not familiar with cloning code, Easwaran may have some insights on why this could be nullptr? Or it's a bug in function clone?
The prof count only attaches to branch instructions. For the case you mentioned, it's already checked by the "if (CallInst *CI = dyn_cast<CallInst>(&I))", thus I simply removed the second check.