This is an archive of the discontinued LLVM Phabricator instance.

[PGO] Annotate branch_weight for invoke instruction
AcceptedPublic

Authored by mingmingl on Sep 1 2022, 9:24 AM.

Details

Summary

Invoke instruction can have both branch_weights and indirect-call VP count.
We currently only have indirect-call VP count for it.
This patch enables branch_weight metadata for invoke instruction.

Diff Detail

Event Timeline

xur created this revision.Sep 1 2022, 9:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 1 2022, 9:24 AM
xur requested review of this revision.Sep 1 2022, 9:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 1 2022, 9:24 AM

Can you simplify the test case (e.g. stripping the debug info)?

hoy added inline comments.Sep 1 2022, 3:53 PM
llvm/include/llvm/IR/Instructions.h
3969

nit: metadat -> metadata

3973

I'm wondering if this function can be made more generic and not restrict to invoke instruction only. I feel it's likely in the future that more types of metadata will be allowed for an instruction. How about returning a something like a hash map with the keys being the type of metadata?

4011

nit: the return nullptr isn't necessary as BranchWeightMD will be set to null if the branch metadata isn't there? Similarly below.

llvm/lib/IR/Metadata.cpp
1512

This seems to change existing behavior for invokes from legacy IR that have only VP metadata.

wenlei added a comment.Sep 1 2022, 4:39 PM

Why is invoke unique here? Can regular call never have both VP and branch weights? Ideally we'd avoid special case for a narrow type of instruction.

xur marked 2 inline comments as done.Sep 5 2022, 4:17 PM
xur added inline comments.
llvm/include/llvm/IR/Instructions.h
3973

As of now, the branch weight metadata is only attached to terminator instructions in FDO.
In sampleFDO, a branch weight can also be attached to a direct call.
VP metadata (more specifically, indirect call metadata) can only be with indirect-call or invoke instruction.
So in current code, we only have this situation for invoke instruction. This function can be written in a more generic way. But there is no use case for now.
I prefer keep this simple form and add a TODO (and expand when there is a use case). What do you think?

4011

that's correct. Fixed.

xur marked an inline comment as done.Sep 5 2022, 4:20 PM

Why is invoke unique here? Can regular call never have both VP and branch weights? Ideally we'd avoid special case for a narrow type of instruction.

In FDO, branch weights only attached to terminal instruction (and call instruction is not, while invoke is). Indirect-call VP only attached to indirect-call or invoke instruction.
In SampleFDO, we also attach a branch weight to direct-call. But again indirect-call only only for indirect-call and invoke.
So only invoke instruction can have both profile metadata.

xur updated this revision to Diff 458068.Sep 5 2022, 4:38 PM

Integrated with review comments.

xur added a comment.Sep 5 2022, 4:38 PM

Can you simplify the test case (e.g. stripping the debug info)?

Test case simplified in the new patch set.

hoy added inline comments.Sep 6 2022, 10:51 AM
llvm/include/llvm/IR/Instructions.h
3973

Sounds good for now.

llvm/lib/IR/Metadata.cpp
1512

How about this? Should both branch_weight and vp metadata be checked here to be downwards compatible?

xur added inline comments.Sep 6 2022, 11:23 AM
llvm/lib/IR/Metadata.cpp
1512

Sorry that I missed this comment.

Yes. you are right on this. Currently we return the VP counts as the total count.
This does create a backward compatibility issue if we use an older profile.

I will check both VP and branch_weights.

xur updated this revision to Diff 458234.Sep 6 2022, 11:41 AM

Integrated Hongtao's suggestion.

hoy added a comment.Sep 6 2022, 1:28 PM

Thanks for the change. Wondering if you saw performance improvement with it.

The corresponding change on the sample profile side isn't included, do you have any plan for it?

llvm/lib/ProfileData/InstrProf.cpp
1077

Can we use get instead of getDistinct so that invokes with same metadata can potentially share the same node?

llvm/lib/Transforms/Utils/Local.cpp
2168

Should the check be if (ProfileData)? If the original invoke doesn't have !VP, the new call should end up with no metadata based on the existing behavior.

xur added inline comments.Sep 7 2022, 8:09 AM
llvm/lib/ProfileData/InstrProf.cpp
1077

Yes. get() seems to be better suited here.

llvm/lib/Transforms/Utils/Local.cpp
2168

If the original invoke does not have VP, it can still have branch_weights.

Maybe my words in patch description is confusing: this patch tries to fix the case where invoke calls to a indirect function.
If the argument of invoke is a direct function -- it will only have a branch_weight and we are doing the right annotations.

For this transformation, mostly like the invoke is a direct function, in which case, we will annotate a total counts -- FDO will not use
this metadata. AutoFDO might use this in the inliner.

If here the newcall is an indirect function, we have have two annotations in invoke and can only have one in the new call. I choose to keep the indirect metadata. This changes the semantic but I think it's the right thing to do.

xur added a comment.Sep 7 2022, 8:19 AM

Thanks for the change. Wondering if you saw performance improvement with it.

The corresponding change on the sample profile side isn't included, do you have any plan for it?

Yes. For AutoFDO, we still need some change for the invoke in sampleloader
The situation is slightly different in AutoFDO. In FDO, we do branch_weight annotation first, then do the VP.

In AutoFDO, we do VP annotation first and then branch_weight annotation. We need to set the branch_weight metadata where there is already a VP metadata (which is the reverse of FDO).
I was planning to have a separated patch for AutoFDO, but I can also include it in this patch.

hoy added a comment.Sep 7 2022, 8:56 AM

Thanks for the change. Wondering if you saw performance improvement with it.

The corresponding change on the sample profile side isn't included, do you have any plan for it?

Yes. For AutoFDO, we still need some change for the invoke in sampleloader
The situation is slightly different in AutoFDO. In FDO, we do branch_weight annotation first, then do the VP.

In AutoFDO, we do VP annotation first and then branch_weight annotation. We need to set the branch_weight metadata where there is already a VP metadata (which is the reverse of FDO).
I was planning to have a separated patch for AutoFDO, but I can also include it in this patch.

A separate patch sounds good, thanks.

llvm/lib/Transforms/Utils/Local.cpp
2168

I see, thanks for the explanation.

xur added a comment.Sep 7 2022, 9:12 AM

Thanks for the change. Wondering if you saw performance improvement with it.

The corresponding change on the sample profile side isn't included, do you have any plan for it?

Yes. For AutoFDO, we still need some change for the invoke in sampleloader
The situation is slightly different in AutoFDO. In FDO, we do branch_weight annotation first, then do the VP.

In AutoFDO, we do VP annotation first and then branch_weight annotation. We need to set the branch_weight metadata where there is already a VP metadata (which is the reverse of FDO).
I was planning to have a separated patch for AutoFDO, but I can also include it in this patch.

A separate patch sounds good, thanks.

AutoFDO is less problematic as branch_weights metadata overwrites VP in the end. (and many VP metadata is already consumed by the prompting in preinliner).

In FDO, VP metadata overwrites branch_weights -- This is bad as we lost more information in BPI/BFI.
(branch_weights is more important for downstream opt).

hoy accepted this revision.EditedSep 7 2022, 9:17 AM

Thanks for the change. Wondering if you saw performance improvement with it.

The corresponding change on the sample profile side isn't included, do you have any plan for it?

Yes. For AutoFDO, we still need some change for the invoke in sampleloader
The situation is slightly different in AutoFDO. In FDO, we do branch_weight annotation first, then do the VP.

In AutoFDO, we do VP annotation first and then branch_weight annotation. We need to set the branch_weight metadata where there is already a VP metadata (which is the reverse of FDO).
I was planning to have a separated patch for AutoFDO, but I can also include it in this patch.

A separate patch sounds good, thanks.

AutoFDO is less problematic as branch_weights metadata overwrites VP in the end. (and many VP metadata is already consumed by the prompting in preinliner).

In FDO, VP metadata overwrites branch_weights -- This is bad as we lost more information in BPI/BFI.
(branch_weights is more important for downstream opt).

It looks like same happens to autofdo, i.e, no branch_weights assigned to indirect invokes? https://github.com/llvm/llvm-project/blob/7b81a81d5f9cccb1b091cfc5264bc483b0acc83a/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1628

Oh, I guess branch_weight is attached to invokes that are promoted from an indirect invoke, so it is better handled than the FDO case.

llvm/lib/ProfileData/InstrProf.cpp
1077

LGTM after this is fixed.

This revision is now accepted and ready to land.Sep 7 2022, 9:17 AM
wenlei added inline comments.Sep 7 2022, 10:21 AM
llvm/include/llvm/IR/Instructions.h
3973

A slight preference for this to be generic. The problem you're dealing with is the unordered nature of MD. Previously it works because there's the assumption that only one profile MD exists, which is not true for invoke.

It's cleaner and more robust if you simply make all APIs not depending on MD order. You don't need a hashmap, but how about just always search through MD when looking for branch_weights or VP. Specifically, that's changing the general implementation of extractBranchWeights and extractProfTotalWeight to be order independent regardless of the opcode.

xur added inline comments.Sep 7 2022, 10:35 AM
llvm/include/llvm/IR/Instructions.h
3973

OK. This makes sense. I'll update the patch with this suggestion.
Thanks!

xur added a comment.Sep 7 2022, 10:41 AM

Thanks for the change. Wondering if you saw performance improvement with it.

The corresponding change on the sample profile side isn't included, do you have any plan for it?

Yes. For AutoFDO, we still need some change for the invoke in sampleloader
The situation is slightly different in AutoFDO. In FDO, we do branch_weight annotation first, then do the VP.

In AutoFDO, we do VP annotation first and then branch_weight annotation. We need to set the branch_weight metadata where there is already a VP metadata (which is the reverse of FDO).
I was planning to have a separated patch for AutoFDO, but I can also include it in this patch.

A separate patch sounds good, thanks.

AutoFDO is less problematic as branch_weights metadata overwrites VP in the end. (and many VP metadata is already consumed by the prompting in preinliner).

In FDO, VP metadata overwrites branch_weights -- This is bad as we lost more information in BPI/BFI.
(branch_weights is more important for downstream opt).

It looks like same happens to autofdo, i.e, no branch_weights assigned to indirect invokes? https://github.com/llvm/llvm-project/blob/7b81a81d5f9cccb1b091cfc5264bc483b0acc83a/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1628

Oh, I guess branch_weight is attached to invokes that are promoted from an indirect invoke, so it is better handled than the FDO case.

the branch weight for invoke is set in line 1724 (https://github.com/llvm/llvm-project/blob/7b81a81d5f9cccb1b091cfc5264bc483b0acc83a/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1724)

line 1628 sets the call's branch-weight (which is a single number, call counts) -- this should be overwritten by line 1724.

hoy added a comment.Sep 7 2022, 10:48 AM

Thanks for the change. Wondering if you saw performance improvement with it.

The corresponding change on the sample profile side isn't included, do you have any plan for it?

Yes. For AutoFDO, we still need some change for the invoke in sampleloader
The situation is slightly different in AutoFDO. In FDO, we do branch_weight annotation first, then do the VP.

In AutoFDO, we do VP annotation first and then branch_weight annotation. We need to set the branch_weight metadata where there is already a VP metadata (which is the reverse of FDO).
I was planning to have a separated patch for AutoFDO, but I can also include it in this patch.

A separate patch sounds good, thanks.

AutoFDO is less problematic as branch_weights metadata overwrites VP in the end. (and many VP metadata is already consumed by the prompting in preinliner).

In FDO, VP metadata overwrites branch_weights -- This is bad as we lost more information in BPI/BFI.
(branch_weights is more important for downstream opt).

It looks like same happens to autofdo, i.e, no branch_weights assigned to indirect invokes? https://github.com/llvm/llvm-project/blob/7b81a81d5f9cccb1b091cfc5264bc483b0acc83a/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1628

Oh, I guess branch_weight is attached to invokes that are promoted from an indirect invoke, so it is better handled than the FDO case.

the branch weight for invoke is set in line 1724 (https://github.com/llvm/llvm-project/blob/7b81a81d5f9cccb1b091cfc5264bc483b0acc83a/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1724)

line 1628 sets the call's branch-weight (which is a single number, call counts) -- this should be overwritten by line 1724.

Line 1724 is guarded by !TI->extractProfTotalWeight(TempWeight) which is non-zero for invokes with !VP?

xur added a comment.Sep 7 2022, 11:01 AM

Thanks for the change. Wondering if you saw performance improvement with it.

The corresponding change on the sample profile side isn't included, do you have any plan for it?

Yes. For AutoFDO, we still need some change for the invoke in sampleloader
The situation is slightly different in AutoFDO. In FDO, we do branch_weight annotation first, then do the VP.

In AutoFDO, we do VP annotation first and then branch_weight annotation. We need to set the branch_weight metadata where there is already a VP metadata (which is the reverse of FDO).
I was planning to have a separated patch for AutoFDO, but I can also include it in this patch.

A separate patch sounds good, thanks.

AutoFDO is less problematic as branch_weights metadata overwrites VP in the end. (and many VP metadata is already consumed by the prompting in preinliner).

In FDO, VP metadata overwrites branch_weights -- This is bad as we lost more information in BPI/BFI.
(branch_weights is more important for downstream opt).

It looks like same happens to autofdo, i.e, no branch_weights assigned to indirect invokes? https://github.com/llvm/llvm-project/blob/7b81a81d5f9cccb1b091cfc5264bc483b0acc83a/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1628

Oh, I guess branch_weight is attached to invokes that are promoted from an indirect invoke, so it is better handled than the FDO case.

the branch weight for invoke is set in line 1724 (https://github.com/llvm/llvm-project/blob/7b81a81d5f9cccb1b091cfc5264bc483b0acc83a/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1724)

line 1628 sets the call's branch-weight (which is a single number, call counts) -- this should be overwritten by line 1724.

Line 1724 is guarded by !TI->extractProfTotalWeight(TempWeight) which is non-zero for invokes with !VP?

Yes. This is the place that need to be changed.
Here TI is the invoke. It can have either call-count branch weight metadata (set in 1628), or VP metadata set in line 878.

We need to set the right branch_weight here -- if there is a VP metadada, we create a compound one. If this is single callcount weight, we replace it with real branch_weight.

mingmingl commandeered this revision.Apr 20 2023, 5:13 PM
mingmingl added a reviewer: xur.
mingmingl added a comment.EditedApr 20 2023, 5:15 PM

Ohhh no!

I just want to do suggested edit for a comment (around IndirectCallPromotion.cpp line 316) but obviously this is not the correct way to do it.. I suppose there is a way you could get it back?

Ohhh no!

I just want to do suggested edit for a comment (around IndirectCallPromotion.cpp line 316) but obviously this is not the correct way to do it.. I suppose there is a way you could get it back?

Looks like you could commandeer it back (https://stackoverflow.com/questions/28296729/how-to-undo-commandeer-revision-in-phabricator) My apologies about it :(