This is an archive of the discontinued LLVM Phabricator instance.

[SampleFDO] Support enabling -funique-internal-linkage-name
ClosedPublic

Authored by wmi on Feb 17 2021, 8:41 PM.

Details

Summary

Now we have -funique-internal-linkage-name support, we want to flip it on by default since it is beneficial to have separate sample profiles for different internal symbols with the same name. As a preparation, we want to avoid regression caused by the flip.

There are two kinds of possible regressions. One kind of regression is when we flip -funique-internal-linkage-name on, the profile is collected from binary built without -funique-internal-linkage-name so it has no uniq suffix, but the IR in the optimized build contains the suffix. This may introduce transient regression. In rare case, user may also want to disable -funique-internal-linkage-name after it has been enabled for a while, to triage something for example. In that case, profile will contain the suffix but IR in the optimized build won't.

To avoid such mismatch, we introduce a NameTable section flag indicating whether there is any name in the profile containing uniq suffix. Compiler will decide whether to keep uniq suffix during name canonicalization depending on the NameTable section flag. The flag is only available for extbinary format. For other formats, by default compiler will keep uniq suffix so they will only experience transient regression when -funique-internal-linkage-name is just flipped.

Another type of regression is caused by places where we miss to call getCanonicalFnName. Those places are fixed.

Diff Detail

Event Timeline

wmi created this revision.Feb 17 2021, 8:41 PM
wmi requested review of this revision.Feb 17 2021, 8:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 17 2021, 8:41 PM
wmi added a comment.Feb 17 2021, 8:44 PM

I tested the patch. For our internal search benchmark, there was no regression when we enabled the flag in optimized build but still used a profile without uniq suffix. If we enabled the flag and also used a profile containing uniq suffix, we saw 0.5% improvement.

In D96932#2570512, @wmi wrote:

I tested the patch. For our internal search benchmark, there was no regression when we enabled the flag in optimized build but still used a profile without uniq suffix. If we enabled the flag and also used a profile containing uniq suffix, we saw 0.5% improvement.

Thanks for the improvements. We also noticed the discrepancy in name canonicalization between SymbolMap insertion and getCanonicalFnName, and was going to fix it. We're enabling unique name for CSSPGO too because CFG checksum is much less tolerant and mismatch checksum throttles performance. We saw 10% perf delta on spec2017/perlbench w/ and w/o unique name for CSSPGO. But wasn't expecting something like 0.5% for baseline AutoFDO on larger workloads.. Thanks for the data point..

Another type of regression is caused by places where we miss to call getCanonicalFnName. Those places are fixed.

For context tracker, we have canonicalization done in llvm-profgen as part of profile generation which is why we didn't canonicalize on the compiler side.. Now with the new model that let compiler decide whether to strip suffix based on section flag, we should skip canonicalization llvm-profgen, does that sound right? I guess your profile generation tool doesn't do name canonicalization?

wmi added a comment.Feb 18 2021, 10:19 AM

Another type of regression is caused by places where we miss to call getCanonicalFnName. Those places are fixed.

For context tracker, we have canonicalization done in llvm-profgen as part of profile generation which is why we didn't canonicalize on the compiler side.. Now with the new model that let compiler decide whether to strip suffix based on section flag, we should skip canonicalization llvm-profgen, does that sound right? I guess your profile generation tool doesn't do name canonicalization?

Another usage of canonicalization is to merge profile. For example, suppose we have a function clone from const propagation, and we want to merge the profile from the clone with the profile from its original body. It is easier to handle it through llvm-profgen and make the profile size smaller before feed it into compiler. Our profile generation tool does name canonicalization.

Thanks for doing this! Much appreciated!

llvm/include/llvm/ProfileData/SampleProf.h
774

As an aside, where does ".part." come from?

784
Do we need a test where "llvm", "part" and "__uniq" appear in the symbol name?  Also, isn't it "__part"?
llvm/include/llvm/ProfileData/SampleProfReader.h
458
typo: "__uniq"
llvm/lib/ProfileData/SampleProf.cpp
38

Why is this true by default?

251

Could you also add a code comment on how this default could be wrong if at all and its implications?

llvm/lib/ProfileData/SampleProfWriter.cpp
211
Another suggestion to store __uniq as a constant somewhere and reuse that .
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
212

Saw this code snippet for the second time, also used in SampleProf.cpp. Could we refactor to one function?

hoy added a subscriber: wlei.Feb 20 2021, 12:07 AM

Thanks for working on this. Good to know unique linkage names helps AutoFDO with a 0.5% gain from Google benchmarks.

In D96932#2572160, @wmi wrote:

Another type of regression is caused by places where we miss to call getCanonicalFnName. Those places are fixed.

For context tracker, we have canonicalization done in llvm-profgen as part of profile generation which is why we didn't canonicalize on the compiler side.. Now with the new model that let compiler decide whether to strip suffix based on section flag, we should skip canonicalization llvm-profgen, does that sound right? I guess your profile generation tool doesn't do name canonicalization?

Another usage of canonicalization is to merge profile. For example, suppose we have a function clone from const propagation, and we want to merge the profile from the clone with the profile from its original body. It is easier to handle it through llvm-profgen and make the profile size smaller before feed it into compiler. Our profile generation tool does name canonicalization.

Good point about the cloned functions. I'm wondering if the canonicalization should be uniformly done in the compiler in every place does name mapping, i.e, from Function to Profile or backwards. The profile generator may not have a full view of the IR and it may not strip away the suffixes accurately without seeing the sample-profile-suffix-elision-policy attribute. cc @wlei

llvm/include/llvm/ProfileData/SampleProf.h
774

Can you please add a comment about the order of these suffixes? Looks like it's in reversed appending order, i.e., "._uniq." is always the first suffix to the original name.

wmi added a comment.Feb 22 2021, 5:52 PM

For context tracker, we have canonicalization done in llvm-profgen as part of profile generation which is why we didn't canonicalize on the compiler side.. Now with the new model that let compiler decide whether to strip suffix based on section flag, we should skip canonicalization llvm-profgen, does that sound right? I guess your profile generation tool doesn't do name canonicalization?

Another usage of canonicalization is to merge profile. For example, suppose we have a function clone from const propagation, and we want to merge the profile from the clone with the profile from its original body. It is easier to handle it through llvm-profgen and make the profile size smaller before feed it into compiler. Our profile generation tool does name canonicalization.

Good point about the cloned functions. I'm wondering if the canonicalization should be uniformly done in the compiler in every place does name mapping, i.e, from Function to Profile or backwards. The profile generator may not have a full view of the IR and it may not strip away the suffixes accurately without seeing the sample-profile-suffix-elision-policy attribute. cc @wlei

Right, that is valid concern. I remember the function attribute was added because Go lang was trying to add some SampleFDO support and they wanted to use a different elision policy than C++, but they havn't really used it, so we currently get away with it by using a same elision strategy for all symbols when we process a profile. I agree it is better to do all the canonicalization work in compiler. Profile merging is less of a concern.

llvm/include/llvm/ProfileData/SampleProf.h
774

It is there for a long time. I guess it is copied from gcc because gcc has such suffix like '.isra.' and '.part.' . I cannot find the suffix in llvm code base. I just notice that bb section add '.part.' suffix and I havn't understand in which case we add the '.part.' suffix. If it needs some change as well, we can address it in a separate patch.

784

I updated the test to include the case with name like "symbol.__uniq.123.llvm.456"

llvm/include/llvm/ProfileData/SampleProfReader.h
458

Thanks for the catch. Fixed.

llvm/lib/ProfileData/SampleProf.cpp
38

Because for extbinary profile format, we can have the flag in the binary showing the profile has name with ".__uniq." suffix, so nothing to be worry about.

However, for other types of format, we don't have the flag to showing that. We can examine each name when we read the profile to see whether there is the suffix, but that will increase compile time. The solution is, for other format, we make the default behavior to assume there is ".uniq." suffix (i.e. make FunctionSamples::HasUniqSuffix default true). It will have profile mismatch when the profile doesn't contain ".uniq." suffix while the IR in the optimized build contains ".__uniq." suffix. That will be a transient regression for the format other than extbinary. I think that is the tradeoff to minimize impact for profile other than extbinary format when we enable -funique-internal-linkage-name by default.

251

I added a comment for it.

llvm/lib/ProfileData/SampleProfWriter.cpp
211

Good point. Done.

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
212

Good suggestion. Done.

wmi updated this revision to Diff 325662.Feb 22 2021, 8:59 PM

Address Sri and Hongtao's comments.

hoy added inline comments.Feb 22 2021, 11:39 PM
llvm/include/llvm/ProfileData/SampleProf.h
614

Nit: canonicalization?

620

How about making this a static field of FunctionSamples?

722

Is SymbolMap always available to this function? Looks like it could be null for other functions like findFunctionSamplesAt.

llvm/lib/Transforms/IPO/SampleProfile.cpp
2049

Nit: "" -> StringRef() ?

wmi added inline comments.Feb 23 2021, 4:36 PM
llvm/include/llvm/ProfileData/SampleProf.h
614

Thanks, fixed.

620

That will look clearer, but I am not sure whether it will introduce data race problem. Look at the old problem Wenlei fixed: https://reviews.llvm.org/rG4b99b58a847c8424d9992a1e92a9f8ae7e4d7b51. The existing static members are of simple types and are all initialized with the same value for all the modules so they may be fine. SymbolMap read/write could be prone to such problem.

722

Yes, the function is used to collect function GUID with profile but not present in current module for ThinLTO importing. That function is only used in SampleLoaderPass so SymbolMap is always available. findFunctionSamplesAt has been used more widely than just SampleLoaderPass so SymbolMap is not always available.

llvm/lib/Transforms/IPO/SampleProfile.cpp
2049

Fixed.

wmi updated this revision to Diff 325940.Feb 23 2021, 5:03 PM

Address Hongtao's comment.

hoy added inline comments.Feb 23 2021, 5:33 PM
llvm/include/llvm/ProfileData/SampleProf.h
620

I actually meant to make the SymbolMap pointer as a static field. Does that sound safe?

wmi added inline comments.Feb 23 2021, 6:03 PM
llvm/include/llvm/ProfileData/SampleProf.h
620

Each module will have its own SymbolMap so the thread for one module may see other module's SymbolMap pointer.

I think the other static vars in FunctionSamples can get away with the problem only because they are all initialized to the same value for different modules using the same profile.

hoy added inline comments.Feb 23 2021, 6:23 PM
llvm/include/llvm/ProfileData/SampleProf.h
620

Oh yeah, SymbolMap is per module. Threading will be a problem.

722

As far as I understand, SymbolMap is always available in compiler but not tools. Does that mean tools should never do canonicalization? Sorry, forgot to ask this earlier.

wmi added inline comments.Feb 23 2021, 8:46 PM
llvm/include/llvm/ProfileData/SampleProf.h
722

Right, like we discussed here: https://reviews.llvm.org/D96932#2580530, I agree it is better to let compiler do all the canonicalization work.

hoy added a comment.Feb 23 2021, 9:51 PM

Are the test failures related?

llvm/include/llvm/ProfileData/SampleProf.h
722

Gotcha. We have a couple places in llvm-profgen that call getCanonicalFnName which may not work well with unique linkage names. Do you think they should be removed?

wmi added a comment.Feb 23 2021, 10:41 PM
In D96932#2583927, @hoy wrote:

Are the test failures related?

Thanks for reminding me. I don't have the test failure locally. The error says "Unrecognized sample profile encoding format". I guess it is because those profiles are binaries and the diff cannot be applied on them properly.

llvm/include/llvm/ProfileData/SampleProf.h
722

Yes, they should be removed. Do you think whether I should wait to commit it after llvm-profgen is changed accordingly? Is it possible to cause llvm-profgen problem or it may be fine as long as -funique-internal-linkage-names hasn't been enabled by default?

hoy accepted this revision.Feb 23 2021, 11:21 PM

BTW, in case of duplicated functions due to multi-versioning, the profile generator may not be able to merge them as expected without properly canonicalizing function names, the compiler might be able to achieve that though. As of now, it seems that the last profile for a given canonical name always supersedes all others in the compiler reader. This may need a change for a "merge" instead.

llvm/include/llvm/ProfileData/SampleProf.h
722

It's fine to have this in first. We already identified the unique linkage name problem with llvm-profgen and started removing the getCanonicalFnName calls. cc @wlei

This revision is now accepted and ready to land.Feb 23 2021, 11:21 PM
wmi added a comment.Feb 24 2021, 10:57 AM
In D96932#2584061, @hoy wrote:

BTW, in case of duplicated functions due to multi-versioning, the profile generator may not be able to merge them as expected without properly canonicalizing function names, the compiler might be able to achieve that though. As of now, it seems that the last profile for a given canonical name always supersedes all others in the compiler reader. This may need a change for a "merge" instead.

Right. Do context string need to be canonicalized? If it is needed, the current canonicalization needs some enhancement to support that.

hoy added a comment.Feb 24 2021, 11:03 AM
In D96932#2585513, @wmi wrote:
In D96932#2584061, @hoy wrote:

BTW, in case of duplicated functions due to multi-versioning, the profile generator may not be able to merge them as expected without properly canonicalizing function names, the compiler might be able to achieve that though. As of now, it seems that the last profile for a given canonical name always supersedes all others in the compiler reader. This may need a change for a "merge" instead.

Right. Do context string need to be canonicalized? If it is needed, the current canonicalization needs some enhancement to support that.

A couple places in samplecontexttracker will need call getCanonicalFnName. What enhancement is needed in addition to that?

wmi added a comment.Feb 24 2021, 11:11 AM
In D96932#2585530, @hoy wrote:
In D96932#2585513, @wmi wrote:
In D96932#2584061, @hoy wrote:

BTW, in case of duplicated functions due to multi-versioning, the profile generator may not be able to merge them as expected without properly canonicalizing function names, the compiler might be able to achieve that though. As of now, it seems that the last profile for a given canonical name always supersedes all others in the compiler reader. This may need a change for a "merge" instead.

Right. Do context string need to be canonicalized? If it is needed, the current canonicalization needs some enhancement to support that.

A couple places in samplecontexttracker will need call getCanonicalFnName. What enhancement is needed in addition to that?

For "foo1.llvm.xxx:3 @ foo2.llvm.xxx:2 @ foo3.__uniq.xxx", do we need getCanonicalFnName to canonicalize the whole context string? Or it is divided in samplecontexttracker and getCanonicalFnName will be called for each level?

hoy added a comment.Feb 24 2021, 4:34 PM
In D96932#2585567, @wmi wrote:
In D96932#2585530, @hoy wrote:
In D96932#2585513, @wmi wrote:
In D96932#2584061, @hoy wrote:

BTW, in case of duplicated functions due to multi-versioning, the profile generator may not be able to merge them as expected without properly canonicalizing function names, the compiler might be able to achieve that though. As of now, it seems that the last profile for a given canonical name always supersedes all others in the compiler reader. This may need a change for a "merge" instead.

Right. Do context string need to be canonicalized? If it is needed, the current canonicalization needs some enhancement to support that.

A couple places in samplecontexttracker will need call getCanonicalFnName. What enhancement is needed in addition to that?

For "foo1.llvm.xxx:3 @ foo2.llvm.xxx:2 @ foo3.__uniq.xxx", do we need getCanonicalFnName to canonicalize the whole context string? Or it is divided in samplecontexttracker and getCanonicalFnName will be called for each level?

Good point. Yes, it's a concerned that the suffixes have to be removed by compiler where the underlying profile context strings are not reusable and new strings have to be made. @wenlei will have more comments about this.

Actually we are not canonicalizing the names in profile from compiler side, right? We are only canonicalizing names from IR to match names from profile.

Thinking about it more, doing canonicalization in compiler for CSSPGO may be trickier, because we try to reuse a single string for a context. It's feasible to canonicalize context strings in compiler, but it will have cost.

One kind of regression is when we flip -funique-internal-linkage-name on, the profile is collected from binary built without -funique-internal-linkage-name so it has no uniq suffix, but the IR in the optimized build contains the suffix. This may introduce transient regression.

This could happen when we migrate. To tackle this one, SecFlagUniqSuffix added in this patch is good enough and we don't need to canonicalize names in profile from compiler side, as they can still be done in the profile generation tools.

In rare case, user may also want to disable -funique-internal-linkage-name after it has been enabled for a while, to triage something for example. In that case, profile will contain the suffix but IR in the optimized build won't.

How is this handled currently? Also to me this feel not so important and it's more like debugging helper. If we don't intend to support such case, I don't see a need to move all name canonicalization from profile generation tool into the compiler..

llvm/lib/ProfileData/SampleProfWriter.cpp
210

Ideally, this flag should be derived from whether the binary was built with unique names. If we try to derive the flag from existing names, whether a profile happens to not contain static symbol (even if -funique-internal-linkage-name is always on) may affect how names are canonicalized for pgo pass2 (if we have new static symbols there)..

Not sure if it would happen in practice, but this is more from consistency angle rather than a practical concern..

wmi added a comment.Feb 24 2021, 11:02 PM

Actually we are not canonicalizing the names in profile from compiler side, right? We are only canonicalizing names from IR to match names from profile.

That is right.

Thinking about it more, doing canonicalization in compiler for CSSPGO may be trickier, because we try to reuse a single string for a context. It's feasible to canonicalize context strings in compiler, but it will have cost.

One kind of regression is when we flip -funique-internal-linkage-name on, the profile is collected from binary built without -funique-internal-linkage-name so it has no uniq suffix, but the IR in the optimized build contains the suffix. This may introduce transient regression.

This could happen when we migrate. To tackle this one, SecFlagUniqSuffix added in this patch is good enough and we don't need to canonicalize names in profile from compiler side, as they can still be done in the profile generation tools.

In rare case, user may also want to disable -funique-internal-linkage-name after it has been enabled for a while, to triage something for example. In that case, profile will contain the suffix but IR in the optimized build won't.

How is this handled currently? Also to me this feel not so important and it's more like debugging helper. If we don't intend to support such case, I don't see a need to move all name canonicalization from profile generation tool into the compiler.

Thanks for pointing it out. I found I made a wrong declaration about it. I mistakenly thought the profile name was canonicalized, and even that is true, it is still tricky to support this scenario. Sorry about the wrong statement. But hopefully the case is rare enough and we don't need to support it too.

The need to move canonicalization to compiler is that profile generation tool cannot see the function attribute. An example is if the profile is generated from a go/c++ mixed code, go may have different canonicalization rule than c++. To be consistent with compiler, the profile generation tool needs to know which symbol is a go symbol and which is a c++ symbol. I don't know whether it is always easy to tell just from the name pattern. Currently it is still not an issue for us (go team did some work on autofdo support before but is not actively working on the support now), but I don't know whether it is needed for us in the future.

In D96932#2586797, @wmi wrote:

Actually we are not canonicalizing the names in profile from compiler side, right? We are only canonicalizing names from IR to match names from profile.

That is right.

Thinking about it more, doing canonicalization in compiler for CSSPGO may be trickier, because we try to reuse a single string for a context. It's feasible to canonicalize context strings in compiler, but it will have cost.

One kind of regression is when we flip -funique-internal-linkage-name on, the profile is collected from binary built without -funique-internal-linkage-name so it has no uniq suffix, but the IR in the optimized build contains the suffix. This may introduce transient regression.

This could happen when we migrate. To tackle this one, SecFlagUniqSuffix added in this patch is good enough and we don't need to canonicalize names in profile from compiler side, as they can still be done in the profile generation tools.

In rare case, user may also want to disable -funique-internal-linkage-name after it has been enabled for a while, to triage something for example. In that case, profile will contain the suffix but IR in the optimized build won't.

How is this handled currently? Also to me this feel not so important and it's more like debugging helper. If we don't intend to support such case, I don't see a need to move all name canonicalization from profile generation tool into the compiler.

Thanks for pointing it out. I found I made a wrong declaration about it. I mistakenly thought the profile name was canonicalized, and even that is true, it is still tricky to support this scenario. Sorry about the wrong statement. But hopefully the case is rare enough and we don't need to support it too.

The need to move canonicalization to compiler is that profile generation tool cannot see the function attribute. An example is if the profile is generated from a go/c++ mixed code, go may have different canonicalization rule than c++. To be consistent with compiler, the profile generation tool needs to know which symbol is a go symbol and which is a c++ symbol. I don't know whether it is always easy to tell just from the name pattern. Currently it is still not an issue for us (go team did some work on autofdo support before but is not actively working on the support now), but I don't know whether it is needed for us in the future.

Thanks for the clarification. In that case, agreed that we don't need to support the 2nd case of mismatch. For profile name canonicalization inside compiler, since it will have cost for csspgo, I'd suggest we leave it in llvm-profgen (we don't strip unique suffix there, but still handle others, i.e. getCanonicalFnName(.., "selected")) until the need becomes more substantial.. This change isn't doing that for either afdo or csspgo, so looks good on that front.

wenlei added inline comments.Feb 25 2021, 8:51 AM
llvm/include/llvm/ProfileData/SampleProf.h
822

If this is ok for inlinee, is it ok for standalone functions as well? I think this SymbolMap lookup has lead to extra function argument in quite a few places, so wondering how critical is that given that it still doesn't cover the inlined cases.

IIUC, with SecFlagUniqSuffix plus using "selected" policy and the fix for canonicalization upon symbolMap insertion, this patch already supported migration to turn on -funique-internal-linkage-name without regression, if c++/go mix isn't a concern, which sounds like a separate issue for the future. And if we want to cover c++/go mix, perhaps something that handles both inline and non-inline cases would be needed?

wmi added inline comments.Feb 25 2021, 11:00 AM
llvm/include/llvm/ProfileData/SampleProf.h
822

Right, currently if we simply use "selected" strategy for all here, it is of no problem in terms of supporting -funique-internal-linkage-names. Using SymbolMap here is a best effort but like you said we need to add extra function argument in a few places. Using "selected" strategy by default here and leaving the change of using SymbolMap till the point when we really need it sounds good with me.

Fundamentally I think compiler should keep declaration when a function is all inlined. The inlined and deleted function could have its name referred in debug information, and it is possible their function attributes will be needed for multiple usages.

wmi updated this revision to Diff 326860.Feb 26 2021, 6:08 PM

Address Wenlei's comment. Also change the default elide strategy to "selected" because I think that is the most commonly used strategy.

wmi updated this revision to Diff 326863.Feb 26 2021, 6:11 PM

Fixed a typo.

wenlei accepted this revision.Mar 8 2021, 9:09 PM

lgtm, thanks.

This revision was landed with ongoing or failed builds.Mar 9 2021, 9:42 PM
This revision was automatically updated to reflect the committed changes.