This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/CommandGuide/
-
CommandGuide/
-
llvm-profdata.rst
-
include/llvm/
-
llvm/
-
ProfileData/
-
SampleProf.h
-
Transforms/IPO/
-
IPO/
-
ProfiledCallGraph.h
-
SampleContextTracker.h
-
lib/
-
ProfileData/
-
SampleProf.cpp
-
SampleProfReader.cpp
-
SampleProfWriter.cpp
-
Transforms/IPO/
-
IPO/
1/2
SampleContextTracker.cpp
-
SampleProfile.cpp
-
test/
-
Transforms/SampleProfile/
-
SampleProfile/
-
Inputs/
-
csspgo-import-list.md5.prof
-
indirect-call-csspgo-md5.prof
-
profile-context-tracker-md5.prof
-
csspgo-import-list.ll
-
csspgo-inline-icall.ll
-
csspgo-inline.ll
-
tools/
-
llvm-profdata/
-
cs-sample-profile.test
-
llvm-profgen/
-
noinline-cs-noprobe.test
-
noinline-cs-pseudoprobe.test
-
tools/
-
llvm-profdata/
-
llvm-profdata.cpp
-
llvm-profgen/
-
CSPreInliner.h
-
CSPreInliner.cpp
-
ProfileGenerator.cpp

Differential D107173

[CSSPGO] Introduce MD5-based context-sensitive profile
AbandonedPublic

Authored by hoy on Jul 30 2021, 9:04 AM.

Download Raw Diff

Details

Reviewers

wmi
davidxl
wenlei
wlei

Summary

String-based CS profiles can have severe size inflation for C++ programs with very long function names. We have seen in some extreme cases, the name table sections took 99% of profile size for an extbinary profile, which in turn caused compiler to OOM or slow down. To address that issue we are enabling MD5-based CS profile.

Different from a MD5 non-CS profile where MD5 codes are stored as integers in the name table section and can be used with extbinary only, a MD5 CS profile keeps the profile context in string form with MD5 code used to represent functions in the context. Therefore it can be used with both text and extbinary profiles.

Here is an example of a name-based CS text profile and the MD5 counterpart:

[main:3.1 @ _Z5funcBi]:120:19
 0: 19
 1: 19 _Z8funcLeafi:20
 3: 12

[0xdb956436e78dd5fa:3.1 @ 0x630ba95aaba8cb5]:120:19
 0: 19
 1: 19 0x62919f2827854931:20
 3: 12

Note that in the MD5 profile all function names are replaced by their MD5 codes.

The extbinary equivalents work similarly by having the context (either in real names or MD5 codes) stored in the name table section. The main benefit of this is to avoid reconstructing the context string in the sample loader. An icing on the cake is to allow mixed use of real names and MD5 codes. There is a need of this when we start squeezing the size of pseudo probe descs.

Implementation
To support this string-based MD5 profile, we reuse part of the jobs done to the non-CS profile while diverge from the rest. The profile producer, i.e, llvm-profgen and lvm-profdata, will need a special flag --use-md5 to generate MD5 profile. Therefore the internal flag FunctionSamples::UseMD5 needs to be set. However, the profile consumer, i.e, the sample profile loader, is set up to know automatically if a function name is a real name or a MD5 code, based on the 0x prefix, and it does not need FunctionSamples::UseMD5 .

The sample context tracker is tweaked to operate on integral MD5 codes internally to support contexts with real names and MD5 codes. A GUIDToFuncNameMap, which is always built for CS profile, can be used to look up real names for debugging.

Testing
With the current change, the compiler generates exactly same code with MD5 and non-MD5 CS profile. Tested with SPEC and an internal large service. For the large service, extbinary profile size was down by 10x, build time reduced to half.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hoy created this revision.Jul 30 2021, 9:04 AM

Herald added subscribers: ormris, modimo, wenlei and 2 others. · View Herald TranscriptJul 30 2021, 9:04 AM

hoy requested review of this revision.Jul 30 2021, 9:04 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 30 2021, 9:04 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

hoy edited the summary of this revision. (Show Details)Jul 30 2021, 9:08 AM

hoy edited the summary of this revision. (Show Details)

hoy added reviewers: wmi, davidxl, wenlei, wlei.

Harbormaster completed remote builds in B117199: Diff 363113.Jul 30 2021, 9:48 AM

hoy added inline comments.Jul 30 2021, 9:59 AM

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
222	The ordering based `Hash` in `FuncToCtxtProfiles` is mainly to achieve a consistent context promotion between md5 and non-md5 profiles which in turn gives a consistent codegen. However it is expansive. I tried sorting by the the combination of total sample counts and head sample counts, but still could not get every case covered. I think we might want to do this for non-md5 profile only, to favor md5 performance.

wenlei added inline comments.Jul 30 2021, 12:13 PM

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
222	On the cost of hashing strings, if we using a std::set<FunctionSamples *, ..>, with comparer that checks total samples order and md5 string order sequentially, we would have stable order, with low cost as md5 string order is just a fall back that's rarely used, right?

Thanks for the work to reduce the CS profile size! It is something we really need.

I am also trying it in a slightly different way.
For "[main:3.1 @ _Z5funcBi]:120:19" in your example, I split the context string into multiple tuples of {string, line, discriminator}:
{main, 3, 1} {_Z5funcBi, 0, 0}, and for each name in the tuple, we will only save the index to the name table. So we will not have new entry in the name table for different contexts.

In this way, we won't have any increase in the name table section compared with non-CS profile even when we uses string based name. We do need to have a new section called CSNameTable to store the tuples. When we read the section, we will recontruct the context string from the tuple and the rest of the profile handling will have no change.

In this way, on top of it we can also apply md5 to the nametable using existing md5 mechanism, to further compress the name table section.

I havn't finished the implemention, but I am close to it. It will be good to discuss how we converge the effort here.

In D107173#2917382, @wmi wrote:

Thanks for the work to reduce the CS profile size! It is something we really need.

I am also trying it in a slightly different way.
For "[main:3.1 @ _Z5funcBi]:120:19" in your example, I split the context string into multiple tuples of {string, line, discriminator}:
{main, 3, 1} {_Z5funcBi, 0, 0}, and for each name in the tuple, we will only save the index to the name table. So we will not have new entry in the name table for different contexts.

In this way, we won't have any increase in the name table section compared with non-CS profile even when we uses string based name. We do need to have a new section called CSNameTable to store the tuples. When we read the section, we will recontruct the context string from the tuple and the rest of the profile handling will have no change.

In this way, on top of it we can also apply md5 to the nametable using existing md5 mechanism, to further compress the name table section.

I havn't finished the implemention, but I am close to it. It will be good to discuss how we converge the effort here.

@wmi good to know you are also working on improving the efficiency. Thanks!
The string split design sounds interesting. I'm curious to know more about it, like how context strings are split in the profile, and how they are constructed and represented in the compiler.

Thanks for working on this.

Decomposing context strings into two levels is natural choice for deduplication. However, that means we would need to reconstruct strings in profile reader. So it's more like space vs speed thing. If compiler speed is the same, then storing decomposed strings in file and reconstruct them on profile loading might be a better choice as it's more compact and also more consistent with non-CS profile. Perhaps would be good to measure both.. Right now with md5 profile, plus some context trimming through cold threshold or pre-inliner tuning, we hope to bring e2e build time on par with AutoFDO for large workloads.

In D107173#2917462, @hoy wrote:

In D107173#2917382, @wmi wrote:

Thanks for the work to reduce the CS profile size! It is something we really need.

I am also trying it in a slightly different way.
For "[main:3.1 @ _Z5funcBi]:120:19" in your example, I split the context string into multiple tuples of {string, line, discriminator}:
{main, 3, 1} {_Z5funcBi, 0, 0}, and for each name in the tuple, we will only save the index to the name table. So we will not have new entry in the name table for different contexts.

In this way, we won't have any increase in the name table section compared with non-CS profile even when we uses string based name. We do need to have a new section called CSNameTable to store the tuples. When we read the section, we will recontruct the context string from the tuple and the rest of the profile handling will have no change.

In this way, on top of it we can also apply md5 to the nametable using existing md5 mechanism, to further compress the name table section.

I havn't finished the implemention, but I am close to it. It will be good to discuss how we converge the effort here.

@wmi good to know you are also working on improving the efficiency. Thanks!
The string split design sounds interesting. I'm curious to know more about it, like how context strings are split in the profile, and how they are constructed and represented in the compiler.

Hi hongtao, this is the implementation: https://reviews.llvm.org/D107299. It splits the context to deduplicate the function names during profile writing, and reconstructs the context string during profile reading.

In D107173#2920321, @wenlei wrote:

Thanks for working on this.

Decomposing context strings into two levels is natural choice for deduplication. However, that means we would need to reconstruct strings in profile reader. So it's more like space vs speed thing. If compiler speed is the same, then storing decomposed strings in file and reconstruct them on profile loading might be a better choice as it's more compact and also more consistent with non-CS profile. Perhaps would be good to measure both.. Right now with md5 profile, plus some context trimming through cold threshold or pre-inliner tuning, we hope to bring e2e build time on par with AutoFDO for large workloads.

Right, D107173 in its own can already let me build our search benchmark in distributed way which I cannot do that before because of OOM issue, and that is really helpful. https://reviews.llvm.org/D107299 will still run into OOM in distributed build because we will still have the full context strings after profile reading, and that consumes excessive memory. For D107299, I believe that issue can also be solved after I add md5 support to it.

Like wenlei said, decomposing context strings can give us more compact profile (deduplicating: 418M vs md5: 1.8G for the profile collected from our search test), but reconstructing takes more time (2x). I am thinking about whether reconstructing the context string is necessary since the context represented in ContextTrieNode is also decomposed into multiple levels?

By the way, the context reconstructing time may also be reduced if D107299 can be coupled with md5.

In D107173#2920521, @wmi wrote:

In D107173#2920321, @wenlei wrote:

Thanks for working on this.

Decomposing context strings into two levels is natural choice for deduplication. However, that means we would need to reconstruct strings in profile reader. So it's more like space vs speed thing. If compiler speed is the same, then storing decomposed strings in file and reconstruct them on profile loading might be a better choice as it's more compact and also more consistent with non-CS profile. Perhaps would be good to measure both.. Right now with md5 profile, plus some context trimming through cold threshold or pre-inliner tuning, we hope to bring e2e build time on par with AutoFDO for large workloads.

Right, D107173 in its own can already let me build our search benchmark in distributed way which I cannot do that before because of OOM issue, and that is really helpful. https://reviews.llvm.org/D107299 will still run into OOM in distributed build because we will still have the full context strings after profile reading, and that consumes excessive memory. For D107299, I believe that issue can also be solved after I add md5 support to it.

Like wenlei said, decomposing context strings can give us more compact profile (deduplicating: 418M vs md5: 1.8G for the profile collected from our search test), but reconstructing takes more time (2x). I am thinking about whether reconstructing the context string is necessary since the context represented in ContextTrieNode is also decomposed into multiple levels?

By the way, the context reconstructing time may also be reduced if D107299 can be coupled with md5.

@wmi Thanks for sharing your change and the numbers! And thanks for trying my implementation. Interested have you seen build time improvement with it?

I think name split is a promising idea. It's worth giving it a try by not reconstructing full context strings in the compiler. Talked with Wenlei offline, instead of a stringRef field in SampleContext, which also serves as a key to uniquely identify the context, we could use something like vector<StringRef> instead. There are likely more related issues. Special handling might also be needed in some places when it comes to md5 profile, where not all md5 codes map to a real function in the current module. Let us now if you want to give it a shot.

In D107173#2920602, @hoy wrote:

In D107173#2920521, @wmi wrote:

In D107173#2920321, @wenlei wrote:

Thanks for working on this.

Decomposing context strings into two levels is natural choice for deduplication. However, that means we would need to reconstruct strings in profile reader. So it's more like space vs speed thing. If compiler speed is the same, then storing decomposed strings in file and reconstruct them on profile loading might be a better choice as it's more compact and also more consistent with non-CS profile. Perhaps would be good to measure both.. Right now with md5 profile, plus some context trimming through cold threshold or pre-inliner tuning, we hope to bring e2e build time on par with AutoFDO for large workloads.

Right, D107173 in its own can already let me build our search benchmark in distributed way which I cannot do that before because of OOM issue, and that is really helpful. https://reviews.llvm.org/D107299 will still run into OOM in distributed build because we will still have the full context strings after profile reading, and that consumes excessive memory. For D107299, I believe that issue can also be solved after I add md5 support to it.

Like wenlei said, decomposing context strings can give us more compact profile (deduplicating: 418M vs md5: 1.8G for the profile collected from our search test), but reconstructing takes more time (2x). I am thinking about whether reconstructing the context string is necessary since the context represented in ContextTrieNode is also decomposed into multiple levels?

By the way, the context reconstructing time may also be reduced if D107299 can be coupled with md5.

@wmi Thanks for sharing your change and the numbers! And thanks for trying my implementation. Interested have you seen build time improvement with it?

I didn't compare it with the case without md5 because I cannot build it successfully in our build system without md5. I tried building it locally but it took one week to finish.

I tried dumping the profile to text format and the md5 version took about a half time to finish compared with the case without md5, so that is an indication we should see build time improvement with it.

I think name split is a promising idea. It's worth giving it a try by not reconstructing full context strings in the compiler. Talked with Wenlei offline, instead of a stringRef field in SampleContext, which also serves as a key to uniquely identify the context, we could use something like vector<StringRef> instead. There are likely more related issues. Special handling might also be needed in some places when it comes to md5 profile, where not all md5 codes map to a real function in the current module. Let us now if you want to give it a shot.

Thanks for confirming the possibility of the idea to not reconstruct the full context string in the compiler. I will try the idea.

In D107173#2920842, @wmi wrote:

In D107173#2920602, @hoy wrote:

In D107173#2920521, @wmi wrote:

In D107173#2920321, @wenlei wrote:

Thanks for working on this.

Decomposing context strings into two levels is natural choice for deduplication. However, that means we would need to reconstruct strings in profile reader. So it's more like space vs speed thing. If compiler speed is the same, then storing decomposed strings in file and reconstruct them on profile loading might be a better choice as it's more compact and also more consistent with non-CS profile. Perhaps would be good to measure both.. Right now with md5 profile, plus some context trimming through cold threshold or pre-inliner tuning, we hope to bring e2e build time on par with AutoFDO for large workloads.

Right, D107173 in its own can already let me build our search benchmark in distributed way which I cannot do that before because of OOM issue, and that is really helpful. https://reviews.llvm.org/D107299 will still run into OOM in distributed build because we will still have the full context strings after profile reading, and that consumes excessive memory. For D107299, I believe that issue can also be solved after I add md5 support to it.

Like wenlei said, decomposing context strings can give us more compact profile (deduplicating: 418M vs md5: 1.8G for the profile collected from our search test), but reconstructing takes more time (2x). I am thinking about whether reconstructing the context string is necessary since the context represented in ContextTrieNode is also decomposed into multiple levels?

By the way, the context reconstructing time may also be reduced if D107299 can be coupled with md5.

@wmi Thanks for sharing your change and the numbers! And thanks for trying my implementation. Interested have you seen build time improvement with it?

I didn't compare it with the case without md5 because I cannot build it successfully in our build system without md5. I tried building it locally but it took one week to finish.

I tried dumping the profile to text format and the md5 version took about a half time to finish compared with the case without md5, so that is an indication we should see build time improvement with it.

Curious were you trying probe-based CS profile or line-based CS profile?

I think name split is a promising idea. It's worth giving it a try by not reconstructing full context strings in the compiler. Talked with Wenlei offline, instead of a stringRef field in SampleContext, which also serves as a key to uniquely identify the context, we could use something like vector<StringRef> instead. There are likely more related issues. Special handling might also be needed in some places when it comes to md5 profile, where not all md5 codes map to a real function in the current module. Let us now if you want to give it a shot.

Thanks for confirming the possibility of the idea to not reconstruct the full context string in the compiler. I will try the idea.

Yeah, eliminate round-trip conversion is going to help. One of the reason we choose to store context string as a single string is trying to accommodate how profiles are being kept in profile reader/write using StringMap. If we store context strings in decomposed form in profile file, supporting corresponding decomposed context representation as first-class citizen inside compiler to avoid conversion would be great.

Now looking at the use of SampleProfileReader::getProfiles again, actually the key (name/context) string is redundant since FunctionSamples contains the function name/context too. StringMap is used for AFDO to help point look up by name. But for CSSPGO, look up is served by SampleContextTracker, and the tracker is built by iterating over all profiles, so full context string as key for the StringMap isn't really useful for CSSPGO, but more of a choice out of consistency.

If we want to skip reconstructing full context string from decomposed context input, we could change the string conversion for SampleContext to be a dummy string (e.g. MD5 string of context?) instead of reconstructed full context string, and use that as the key for StringMap profiles. Or we could change the interface for SampleProfileReader::getProfiles to return something like map<SampleContext, FunctionSamples>&., and we always use a SampleContext object for look up - the main content of SampleContext would be a StringRef (AFDO) or vector of StringRef (CSSPGO). We also need to be able to order SampleContext like before - that is used in loading a profile subtree to prepare for importing.

In D107173#2921423, @wenlei wrote:

In D107173#2920842, @wmi wrote:

In D107173#2920602, @hoy wrote:

In D107173#2920521, @wmi wrote:

In D107173#2920321, @wenlei wrote:

Thanks for working on this.

Decomposing context strings into two levels is natural choice for deduplication. However, that means we would need to reconstruct strings in profile reader. So it's more like space vs speed thing. If compiler speed is the same, then storing decomposed strings in file and reconstruct them on profile loading might be a better choice as it's more compact and also more consistent with non-CS profile. Perhaps would be good to measure both.. Right now with md5 profile, plus some context trimming through cold threshold or pre-inliner tuning, we hope to bring e2e build time on par with AutoFDO for large workloads.

Right, D107173 in its own can already let me build our search benchmark in distributed way which I cannot do that before because of OOM issue, and that is really helpful. https://reviews.llvm.org/D107299 will still run into OOM in distributed build because we will still have the full context strings after profile reading, and that consumes excessive memory. For D107299, I believe that issue can also be solved after I add md5 support to it.

Like wenlei said, decomposing context strings can give us more compact profile (deduplicating: 418M vs md5: 1.8G for the profile collected from our search test), but reconstructing takes more time (2x). I am thinking about whether reconstructing the context string is necessary since the context represented in ContextTrieNode is also decomposed into multiple levels?

By the way, the context reconstructing time may also be reduced if D107299 can be coupled with md5.

@wmi Thanks for sharing your change and the numbers! And thanks for trying my implementation. Interested have you seen build time improvement with it?

I didn't compare it with the case without md5 because I cannot build it successfully in our build system without md5. I tried building it locally but it took one week to finish.

I tried dumping the profile to text format and the md5 version took about a half time to finish compared with the case without md5, so that is an indication we should see build time improvement with it.

Curious were you trying probe-based CS profile or line-based CS profile?

I was trying probe-based CS profile.

I think name split is a promising idea. It's worth giving it a try by not reconstructing full context strings in the compiler. Talked with Wenlei offline, instead of a stringRef field in SampleContext, which also serves as a key to uniquely identify the context, we could use something like vector<StringRef> instead. There are likely more related issues. Special handling might also be needed in some places when it comes to md5 profile, where not all md5 codes map to a real function in the current module. Let us now if you want to give it a shot.

Thanks for confirming the possibility of the idea to not reconstruct the full context string in the compiler. I will try the idea.

Yeah, eliminate round-trip conversion is going to help. One of the reason we choose to store context string as a single string is trying to accommodate how profiles are being kept in profile reader/write using StringMap. If we store context strings in decomposed form in profile file, supporting corresponding decomposed context representation as first-class citizen inside compiler to avoid conversion would be great.

Now looking at the use of SampleProfileReader::getProfiles again, actually the key (name/context) string is redundant since FunctionSamples contains the function name/context too. StringMap is used for AFDO to help point look up by name. But for CSSPGO, look up is served by SampleContextTracker, and the tracker is built by iterating over all profiles, so full context string as key for the StringMap isn't really useful for CSSPGO, but more of a choice out of consistency.

If we want to skip reconstructing full context string from decomposed context input, we could change the string conversion for SampleContext to be a dummy string (e.g. MD5 string of context?) instead of reconstructed full context string, and use that as the key for StringMap profiles. Or we could change the interface for SampleProfileReader::getProfiles to return something like map<SampleContext, FunctionSamples>&., and we always use a SampleContext object for look up - the main content of SampleContext would be a StringRef (AFDO) or vector of StringRef (CSSPGO). We also need to be able to order SampleContext like before - that is used in loading a profile subtree to prepare for importing.

Thanks for the suggestion. That is very helpful.

I tried the idea

In D107173#2920842, @wmi wrote:

In D107173#2920602, @hoy wrote:

In D107173#2920521, @wmi wrote:

In D107173#2920321, @wenlei wrote:

Thanks for working on this.

Decomposing context strings into two levels is natural choice for deduplication. However, that means we would need to reconstruct strings in profile reader. So it's more like space vs speed thing. If compiler speed is the same, then storing decomposed strings in file and reconstruct them on profile loading might be a better choice as it's more compact and also more consistent with non-CS profile. Perhaps would be good to measure both.. Right now with md5 profile, plus some context trimming through cold threshold or pre-inliner tuning, we hope to bring e2e build time on par with AutoFDO for large workloads.

Right, D107173 in its own can already let me build our search benchmark in distributed way which I cannot do that before because of OOM issue, and that is really helpful. https://reviews.llvm.org/D107299 will still run into OOM in distributed build because we will still have the full context strings after profile reading, and that consumes excessive memory. For D107299, I believe that issue can also be solved after I add md5 support to it.

Like wenlei said, decomposing context strings can give us more compact profile (deduplicating: 418M vs md5: 1.8G for the profile collected from our search test), but reconstructing takes more time (2x). I am thinking about whether reconstructing the context string is necessary since the context represented in ContextTrieNode is also decomposed into multiple levels?

By the way, the context reconstructing time may also be reduced if D107299 can be coupled with md5.

@wmi Thanks for sharing your change and the numbers! And thanks for trying my implementation. Interested have you seen build time improvement with it?

I didn't compare it with the case without md5 because I cannot build it successfully in our build system without md5. I tried building it locally but it took one week to finish.

I tried dumping the profile to text format and the md5 version took about a half time to finish compared with the case without md5, so that is an indication we should see build time improvement with it.

I think name split is a promising idea. It's worth giving it a try by not reconstructing full context strings in the compiler. Talked with Wenlei offline, instead of a stringRef field in SampleContext, which also serves as a key to uniquely identify the context, we could use something like vector<StringRef> instead. There are likely more related issues. Special handling might also be needed in some places when it comes to md5 profile, where not all md5 codes map to a real function in the current module. Let us now if you want to give it a shot.

Thanks for confirming the possibility of the idea to not reconstruct the full context string in the compiler. I will try the idea.

I tried the idea of using vector<Callsite> (Callsite is a struct containing function name and LineLocation) in SampleContext and changed the ProfileMap to be map<SampleContext, FunctionSamples> in the last two days. I believed the idea should work but I found the required change was much larger than I thought since ProfileMap has been used in many interfaces. I am worried that for the moment I cannot devote enough time to finish the whole change, so now I tend to use Hongtao's md5 patch and leave the further improvement to the future since the current patch has already significantly improved the build process and unblocked our experiment. We can always come back to revisit it. What is your opinion?

In D107173#2927490, @wmi wrote:

I tried the idea

In D107173#2920842, @wmi wrote:

In D107173#2920602, @hoy wrote:

In D107173#2920521, @wmi wrote:

In D107173#2920321, @wenlei wrote:

Thanks for working on this.

Decomposing context strings into two levels is natural choice for deduplication. However, that means we would need to reconstruct strings in profile reader. So it's more like space vs speed thing. If compiler speed is the same, then storing decomposed strings in file and reconstruct them on profile loading might be a better choice as it's more compact and also more consistent with non-CS profile. Perhaps would be good to measure both.. Right now with md5 profile, plus some context trimming through cold threshold or pre-inliner tuning, we hope to bring e2e build time on par with AutoFDO for large workloads.

Right, D107173 in its own can already let me build our search benchmark in distributed way which I cannot do that before because of OOM issue, and that is really helpful. https://reviews.llvm.org/D107299 will still run into OOM in distributed build because we will still have the full context strings after profile reading, and that consumes excessive memory. For D107299, I believe that issue can also be solved after I add md5 support to it.

Like wenlei said, decomposing context strings can give us more compact profile (deduplicating: 418M vs md5: 1.8G for the profile collected from our search test), but reconstructing takes more time (2x). I am thinking about whether reconstructing the context string is necessary since the context represented in ContextTrieNode is also decomposed into multiple levels?

By the way, the context reconstructing time may also be reduced if D107299 can be coupled with md5.

@wmi Thanks for sharing your change and the numbers! And thanks for trying my implementation. Interested have you seen build time improvement with it?

I didn't compare it with the case without md5 because I cannot build it successfully in our build system without md5. I tried building it locally but it took one week to finish.

I tried dumping the profile to text format and the md5 version took about a half time to finish compared with the case without md5, so that is an indication we should see build time improvement with it.

I think name split is a promising idea. It's worth giving it a try by not reconstructing full context strings in the compiler. Talked with Wenlei offline, instead of a stringRef field in SampleContext, which also serves as a key to uniquely identify the context, we could use something like vector<StringRef> instead. There are likely more related issues. Special handling might also be needed in some places when it comes to md5 profile, where not all md5 codes map to a real function in the current module. Let us now if you want to give it a shot.

Thanks for confirming the possibility of the idea to not reconstruct the full context string in the compiler. I will try the idea.

I tried the idea of using vector<Callsite> (Callsite is a struct containing function name and LineLocation) in SampleContext and changed the ProfileMap to be map<SampleContext, FunctionSamples> in the last two days. I believed the idea should work but I found the required change was much larger than I thought since ProfileMap has been used in many interfaces. I am worried that for the moment I cannot devote enough time to finish the whole change, so now I tend to use Hongtao's md5 patch and leave the further improvement to the future since the current patch has already significantly improved the build process and unblocked our experiment. We can always come back to revisit it. What is your opinion?

Thanks for the heads-up, Wei. Sounds good to stick with this patch for now, if you need more time to complete your idea which I think will get us a bigger win and be more aligned with the non-CS implementation. Will you continue working on that?

I tried the idea of using vector<Callsite> (Callsite is a struct containing function name and LineLocation) in SampleContext and changed the ProfileMap to be map<SampleContext, FunctionSamples> in the last two days. I believed the idea should work but I found the required change was much larger than I thought since ProfileMap has been used in many interfaces. I am worried that for the moment I cannot devote enough time to finish the whole change, so now I tend to use Hongtao's md5 patch and leave the further improvement to the future since the current patch has already significantly improved the build process and unblocked our experiment. We can always come back to revisit it. What is your opinion?

Thanks for the heads-up, Wei. Sounds good to stick with this patch for now, if you need more time to complete your idea which I think will get us a bigger win and be more aligned with the non-CS implementation. Will you continue working on that?

Probably I cannot work on it this quarter, so if you like to work on it, free to take D107299 and I will be happy to review it. If you think getting this patch in as a short term solution is preferred, I can start reviewing it.

spupyrev added a subscriber: spupyrev.Aug 9 2021, 8:24 AM

In D107173#2929418, @wmi wrote:

I tried the idea of using vector<Callsite> (Callsite is a struct containing function name and LineLocation) in SampleContext and changed the ProfileMap to be map<SampleContext, FunctionSamples> in the last two days. I believed the idea should work but I found the required change was much larger than I thought since ProfileMap has been used in many interfaces. I am worried that for the moment I cannot devote enough time to finish the whole change, so now I tend to use Hongtao's md5 patch and leave the further improvement to the future since the current patch has already significantly improved the build process and unblocked our experiment. We can always come back to revisit it. What is your opinion?

Thanks for the heads-up, Wei. Sounds good to stick with this patch for now, if you need more time to complete your idea which I think will get us a bigger win and be more aligned with the non-CS implementation. Will you continue working on that?

Probably I cannot work on it this quarter, so if you like to work on it, free to take D107299 and I will be happy to review it. If you think getting this patch in as a short term solution is preferred, I can start reviewing it.

Thanks for the heads-up @wmi . I will take your current change and work from there.

In D107173#2938207, @hoy wrote:

In D107173#2929418, @wmi wrote:

I tried the idea of using vector<Callsite> (Callsite is a struct containing function name and LineLocation) in SampleContext and changed the ProfileMap to be map<SampleContext, FunctionSamples> in the last two days. I believed the idea should work but I found the required change was much larger than I thought since ProfileMap has been used in many interfaces. I am worried that for the moment I cannot devote enough time to finish the whole change, so now I tend to use Hongtao's md5 patch and leave the further improvement to the future since the current patch has already significantly improved the build process and unblocked our experiment. We can always come back to revisit it. What is your opinion?

Thanks for the heads-up, Wei. Sounds good to stick with this patch for now, if you need more time to complete your idea which I think will get us a bigger win and be more aligned with the non-CS implementation. Will you continue working on that?

Probably I cannot work on it this quarter, so if you like to work on it, free to take D107299 and I will be happy to review it. If you think getting this patch in as a short term solution is preferred, I can start reviewing it.

Thanks for the heads-up @wmi . I will take your current change and work from there.

Sounds good. split names could be a better solution. Btw, I suggest we separate out the context tracker change to use md5 - it should be orthogonal.

hoy abandoned this revision.Aug 31 2021, 4:22 PM

Revision Contents

Path

Size

llvm/

docs/

CommandGuide/

llvm-profdata.rst

3 lines

include/

llvm/

ProfileData/

SampleProf.h

19 lines

Transforms/

IPO/

ProfiledCallGraph.h

7 lines

SampleContextTracker.h

57 lines

lib/

ProfileData/

SampleProf.cpp

28 lines

SampleProfReader.cpp

8 lines

SampleProfWriter.cpp

60 lines

Transforms/

IPO/

SampleContextTracker.cpp

183 lines

SampleProfile.cpp

18 lines

test/

Transforms/

SampleProfile/

Inputs/

csspgo-import-list.md5.prof

34 lines

indirect-call-csspgo-md5.prof

14 lines

profile-context-tracker-md5.prof

44 lines

csspgo-import-list.ll

1 line

csspgo-inline-icall.ll

1 line

csspgo-inline.ll

1 line

tools/

llvm-profdata/

cs-sample-profile.test

45 lines

llvm-profgen/

noinline-cs-noprobe.test

12 lines

noinline-cs-pseudoprobe.test

19 lines

tools/

llvm-profdata/

llvm-profdata.cpp

5 lines

llvm-profgen/

CSPreInliner.h

1 line

CSPreInliner.cpp

25 lines

ProfileGenerator.cpp

6 lines

Diff 363113

llvm/docs/CommandGuide/llvm-profdata.rst

	Show First 20 Lines • Show All 147 Lines • ▼ Show 20 Lines
	.. option:: -compress-all-sections=[true\|false]			.. option:: -compress-all-sections=[true\|false]

	Compress all sections when writing the profile. This option can only be used			Compress all sections when writing the profile. This option can only be used
	with sample-based profile in extbinary format.			with sample-based profile in extbinary format.

	.. option:: -use-md5=[true\|false]			.. option:: -use-md5=[true\|false]

	Use MD5 to represent string in name table when writing the profile.			Use MD5 to represent string in name table when writing the profile.
	This option can only be used with sample-based profile in extbinary format.			This option can only be used with sample-based profile in extbinary format
				for non-cs profile.

	.. option:: -gen-partial-profile=[true\|false]			.. option:: -gen-partial-profile=[true\|false]

	Mark the profile to be a partial profile which only provides partial profile			Mark the profile to be a partial profile which only provides partial profile
	coverage for the optimized target. This option can only be used with			coverage for the optimized target. This option can only be used with
	sample-based profile in extbinary format.			sample-based profile in extbinary format.

	.. option:: -supplement-instr-with-sample=path_to_sample_profile			.. option:: -supplement-instr-with-sample=path_to_sample_profile
	▲ Show 20 Lines • Show All 213 Lines • Show Last 20 Lines

llvm/include/llvm/ProfileData/SampleProf.h

Show First 20 Lines • Show All 476 Lines • ▼ Show 20 Lines	public:
bool hasState(ContextStateMask S) { return State & (uint32_t)S; }		bool hasState(ContextStateMask S) { return State & (uint32_t)S; }
void setState(ContextStateMask S) { State \|= (uint32_t)S; }		void setState(ContextStateMask S) { State \|= (uint32_t)S; }
void clearState(ContextStateMask S) { State &= (uint32_t)~S; }		void clearState(ContextStateMask S) { State &= (uint32_t)~S; }
bool hasContext() const { return State != UnknownContext; }		bool hasContext() const { return State != UnknownContext; }
bool isBaseContext() const { return CallingContext.empty(); }		bool isBaseContext() const { return CallingContext.empty(); }
StringRef getNameWithoutContext() const { return Name; }		StringRef getNameWithoutContext() const { return Name; }
StringRef getCallingContext() const { return CallingContext; }		StringRef getCallingContext() const { return CallingContext; }
StringRef getNameWithContext() const { return FullContext; }		StringRef getNameWithContext() const { return FullContext; }
		uint64_t getGUID() const;
		std::string getContextInMd5() const;

private:		private:
// Give a context string, decode and populate internal states like		// Give a context string, decode and populate internal states like
// Function name, Calling context and context state. Example of input		// Function name, Calling context and context state. Example of input
// `ContextStr`: `[main:3 @ _Z5funcAi:1 @ _Z8funcLeafi]`		// `ContextStr`: `[main:3 @ _Z5funcAi:1 @ _Z8funcLeafi]`
void setContext(StringRef ContextStr, ContextStateMask CState) {		void setContext(StringRef ContextStr, ContextStateMask CState) {
assert(!ContextStr.empty());		assert(!ContextStr.empty());
// Note that `[]` wrapped input indicates a full context string, otherwise		// Note that `[]` wrapped input indicates a full context string, otherwise
▲ Show 20 Lines • Show All 348 Lines • ▼ Show 20 Lines	public:
/// Translate \p Name into its original name.		/// Translate \p Name into its original name.
/// When profile doesn't use MD5, \p Name needs no translation.		/// When profile doesn't use MD5, \p Name needs no translation.
/// When profile uses MD5, \p Name in current FunctionSamples		/// When profile uses MD5, \p Name in current FunctionSamples
/// is actually GUID of the original function name. getFuncName will		/// is actually GUID of the original function name. getFuncName will
/// translate \p Name in current FunctionSamples into its original name		/// translate \p Name in current FunctionSamples into its original name
/// by looking up in the function map GUIDToFuncNameMap.		/// by looking up in the function map GUIDToFuncNameMap.
/// If the original name doesn't exist in the map, return empty StringRef.		/// If the original name doesn't exist in the map, return empty StringRef.
StringRef getFuncName(StringRef Name) const {		StringRef getFuncName(StringRef Name) const {
if (!UseMD5)		if (!isGUID(Name))
return Name;		return Name;

assert(GUIDToFuncNameMap && "GUIDToFuncNameMap needs to be populated first");		assert(GUIDToFuncNameMap && "GUIDToFuncNameMap needs to be populated first");
return GUIDToFuncNameMap->lookup(std::stoull(Name.data()));		return GUIDToFuncNameMap->lookup(FunctionSamples::getGUID(Name));
}		}

/// Returns the line offset to the start line of the subprogram.		/// Returns the line offset to the start line of the subprogram.
/// We assume that a single function will not exceed 65535 LOC.		/// We assume that a single function will not exceed 65535 LOC.
static unsigned getOffset(const DILocation *DIL);		static unsigned getOffset(const DILocation *DIL);

/// Returns a unique call site identifier for a given debug location of a call		/// Returns a unique call site identifier for a given debug location of a call
/// instruction. This is wrapper of two scenarios, the probe-based profile and		/// instruction. This is wrapper of two scenarios, the probe-based profile and
Show All 38 Lines	public:
/// GUIDToFuncNameMap saves the mapping from GUID to the symbol name, for		/// GUIDToFuncNameMap saves the mapping from GUID to the symbol name, for
/// all the function symbols defined or declared in current module.		/// all the function symbols defined or declared in current module.
DenseMap<uint64_t, StringRef> *GUIDToFuncNameMap = nullptr;		DenseMap<uint64_t, StringRef> *GUIDToFuncNameMap = nullptr;

// Assume the input \p Name is a name coming from FunctionSamples itself.		// Assume the input \p Name is a name coming from FunctionSamples itself.
// If UseMD5 is true, the name is already a GUID and we		// If UseMD5 is true, the name is already a GUID and we
// don't want to return the GUID of GUID.		// don't want to return the GUID of GUID.
static uint64_t getGUID(StringRef Name) {		static uint64_t getGUID(StringRef Name) {
return UseMD5 ? std::stoull(Name.data()) : Function::getGUID(Name);		uint64_t GUID;
		if (!Name.getAsInteger(0, GUID))
		return GUID;
		if (!Name.getAsInteger(10, GUID))
		return GUID;
		return Function::getGUID(Name);
		}

		static bool isGUID(StringRef Name) {
		uint64_t GUID;
		return !Name.getAsInteger(0, GUID) \|\| !Name.getAsInteger(10, GUID);
}		}

// Find all the names in the current FunctionSamples including names in		// Find all the names in the current FunctionSamples including names in
// all the inline instances and names of call targets.		// all the inline instances and names of call targets.
void findAllNames(DenseSet<StringRef> &NameSet) const;		void findAllNames(DenseSet<StringRef> &NameSet) const;

private:		private:
/// Mangled name of the function.		/// Mangled name of the function.
▲ Show 20 Lines • Show All 134 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	public:

// Constructor for CS profile.		// Constructor for CS profile.
ProfiledCallGraph(SampleContextTracker &ContextTracker) {		ProfiledCallGraph(SampleContextTracker &ContextTracker) {
// BFS traverse the context profile trie to add call edges for calls shown		// BFS traverse the context profile trie to add call edges for calls shown
// in context.		// in context.
std::queue<ContextTrieNode *> Queue;		std::queue<ContextTrieNode *> Queue;
for (auto &Child : ContextTracker.getRootContext().getAllChildContext()) {		for (auto &Child : ContextTracker.getRootContext().getAllChildContext()) {
ContextTrieNode *Callee = &Child.second;		ContextTrieNode *Callee = &Child.second;
addProfiledFunction(Callee->getFuncName());		addProfiledFunction(ContextTracker.getFuncNameFor(Callee));
Queue.push(Callee);		Queue.push(Callee);
}		}

while (!Queue.empty()) {		while (!Queue.empty()) {
ContextTrieNode *Caller = Queue.front();		ContextTrieNode *Caller = Queue.front();
Queue.pop();		Queue.pop();
// Add calls for context. When AddNodeWithSamplesOnly is true, both caller		// Add calls for context. When AddNodeWithSamplesOnly is true, both caller
// and callee need to have context profile.		// and callee need to have context profile.
// Note that callsite target samples are completely ignored since they can		// Note that callsite target samples are completely ignored since they can
// conflict with the context edges, which are formed by context		// conflict with the context edges, which are formed by context
// compression during profile generation, for cyclic SCCs. This may		// compression during profile generation, for cyclic SCCs. This may
// further result in an SCC order incompatible with the purely		// further result in an SCC order incompatible with the purely
// context-based one, which may in turn block context-based inlining.		// context-based one, which may in turn block context-based inlining.
for (auto &Child : Caller->getAllChildContext()) {		for (auto &Child : Caller->getAllChildContext()) {
ContextTrieNode *Callee = &Child.second;		ContextTrieNode *Callee = &Child.second;
addProfiledFunction(Callee->getFuncName());		addProfiledFunction(ContextTracker.getFuncNameFor(Callee));
Queue.push(Callee);		Queue.push(Callee);
addProfiledCall(Caller->getFuncName(), Callee->getFuncName());		addProfiledCall(ContextTracker.getFuncNameFor(Caller),
		ContextTracker.getFuncNameFor(Callee));
}		}
}		}
}		}

iterator begin() { return Root.Callees.begin(); }		iterator begin() { return Root.Callees.begin(); }
iterator end() { return Root.Callees.end(); }		iterator end() { return Root.Callees.end(); }
ProfiledCallGraphNode *getEntryNode() { return &Root; }		ProfiledCallGraphNode *getEntryNode() { return &Root; }
void addProfiledFunction(StringRef Name) {		void addProfiledFunction(StringRef Name) {
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h

	Show All 17 Lines
	#include "llvm/ADT/SmallVector.h"			#include "llvm/ADT/SmallVector.h"
	#include "llvm/ADT/StringMap.h"			#include "llvm/ADT/StringMap.h"
	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"
	#include "llvm/IR/DebugInfoMetadata.h"			#include "llvm/IR/DebugInfoMetadata.h"
	#include "llvm/IR/Instructions.h"			#include "llvm/IR/Instructions.h"
	#include "llvm/ProfileData/SampleProf.h"			#include "llvm/ProfileData/SampleProf.h"
	#include <list>			#include <list>
	#include <map>			#include <map>
				#include <set>
				#include <unordered_map>
	#include <vector>			#include <vector>

	using namespace llvm;			using namespace llvm;
	using namespace sampleprof;			using namespace sampleprof;

	namespace llvm {			namespace llvm {
				class SampleContextTracker;
	// Internal trie tree representation used for tracking context tree and sample			// Internal trie tree representation used for tracking context tree and sample
	// profiles. The path from root node to a given node represents the context of			// profiles. The path from root node to a given node represents the context of
	// that nodes' profile.			// that nodes' profile.
	class ContextTrieNode {			class ContextTrieNode {
	public:			public:
	ContextTrieNode(ContextTrieNode *Parent = nullptr,			ContextTrieNode(ContextTrieNode *Parent = nullptr, uint64_t GUID = 0,
	StringRef FName = StringRef(),
	FunctionSamples *FSamples = nullptr,			FunctionSamples *FSamples = nullptr,
	LineLocation CallLoc = {0, 0})			LineLocation CallLoc = {0, 0})
	: ParentContext(Parent), FuncName(FName), FuncSamples(FSamples),			: ParentContext(Parent), GUID(GUID), FuncSamples(FSamples),
	CallSiteLoc(CallLoc){};			CallSiteLoc(CallLoc){};
	ContextTrieNode *getChildContext(const LineLocation &CallSite,			ContextTrieNode *getChildContext(const LineLocation &CallSite,
	StringRef CalleeName);			uint64_t CalleeGUID);
	ContextTrieNode *getHottestChildContext(const LineLocation &CallSite);			ContextTrieNode *getHottestChildContext(const LineLocation &CallSite);
	ContextTrieNode *getOrCreateChildContext(const LineLocation &CallSite,			ContextTrieNode *getOrCreateChildContext(const LineLocation &CallSite,
	StringRef CalleeName,			uint64_t CalleeGUID,
	bool AllowCreate = true);			bool AllowCreate = true);

	ContextTrieNode &moveToChildContext(const LineLocation &CallSite,			ContextTrieNode &moveToChildContext(const LineLocation &CallSite,
	ContextTrieNode &&NodeToMove,			ContextTrieNode &&NodeToMove,
	StringRef ContextStrToRemove,			StringRef ContextStrToRemove,
				SampleContextTracker &ContextTracker,
	bool DeleteNode = true);			bool DeleteNode = true);
	void removeChildContext(const LineLocation &CallSite, StringRef CalleeName);			void removeChildContext(const LineLocation &CallSite, uint64_t CalleeGUID);
	std::map<uint32_t, ContextTrieNode> &getAllChildContext();			std::map<uint64_t, ContextTrieNode> &getAllChildContext();
	StringRef getFuncName() const;			uint64_t getFuncGUID() const;
	FunctionSamples *getFunctionSamples() const;			FunctionSamples *getFunctionSamples() const;
	void setFunctionSamples(FunctionSamples *FSamples);			void setFunctionSamples(FunctionSamples *FSamples);
	LineLocation getCallSiteLoc() const;			LineLocation getCallSiteLoc() const;
	ContextTrieNode *getParentContext() const;			ContextTrieNode *getParentContext() const;
	void setParentContext(ContextTrieNode *Parent);			void setParentContext(ContextTrieNode *Parent);
	void dump();			void dump(SampleContextTracker &ContextTracker);

	private:			private:
	static uint32_t nodeHash(StringRef ChildName, const LineLocation &Callsite);			static uint64_t nodeHash(uint64_t ChildGUID, const LineLocation &Callsite);

	// Map line+discriminator location to child context			// Map line+discriminator location to child context
	std::map<uint32_t, ContextTrieNode> AllChildContext;			std::map<uint64_t, ContextTrieNode> AllChildContext;

	// Link to parent context node			// Link to parent context node
	ContextTrieNode *ParentContext;			ContextTrieNode *ParentContext;

	// Function name for current context			// Function GUID for current context
	StringRef FuncName;			uint64_t GUID;

	// Function Samples for current context			// Function Samples for current context
	FunctionSamples *FuncSamples;			FunctionSamples *FuncSamples;

	// Callsite location in parent context			// Callsite location in parent context
	LineLocation CallSiteLoc;			LineLocation CallSiteLoc;
	};			};

	// Profile tracker that manages profiles and its associated context. It			// Profile tracker that manages profiles and its associated context. It
	// provides interfaces used by sample profile loader to query context profile or			// provides interfaces used by sample profile loader to query context profile or
	// base profile for given function or location; it also manages context tree			// base profile for given function or location; it also manages context tree
	// manipulation that is needed to accommodate inline decisions so we have			// manipulation that is needed to accommodate inline decisions so we have
	// accurate post-inline profile for functions. Internally context profiles			// accurate post-inline profile for functions. Internally context profiles
	// are organized in a trie, with each node representing profile for specific			// are organized in a trie, with each node representing profile for specific
	// calling context and the context is identified by path from root to the node.			// calling context and the context is identified by path from root to the node.
	class SampleContextTracker {			class SampleContextTracker {
	public:			public:
	using ContextSamplesTy = SmallVector<FunctionSamples *, 16>;			using ContextSamplesTy = std::map<uint64_t, FunctionSamples *>;

	SampleContextTracker(StringMap<FunctionSamples> &Profiles);			SampleContextTracker(StringMap<FunctionSamples> &Profiles,
				DenseMap<uint64_t, StringRef> &GUIDToFuncNameMap);
	// Query context profile for a specific callee with given name at a given			// Query context profile for a specific callee with given name at a given
	// call-site. The full context is identified by location of call instruction.			// call-site. The full context is identified by location of call instruction.
	FunctionSamples *getCalleeContextSamplesFor(const CallBase &Inst,			FunctionSamples *getCalleeContextSamplesFor(const CallBase &Inst,
	StringRef CalleeName);			uint64_t CalleeGUID);
	// Get samples for indirect call targets for call site at given location.			// Get samples for indirect call targets for call site at given location.
	std::vector<const FunctionSamples *>			std::vector<const FunctionSamples *>
	getIndirectCalleeContextSamplesFor(const DILocation *DIL);			getIndirectCalleeContextSamplesFor(const DILocation *DIL);
	// Query context profile for a given location. The full context			// Query context profile for a given location. The full context
	// is identified by input DILocation.			// is identified by input DILocation.
	FunctionSamples getContextSamplesFor(const DILocation DIL);			FunctionSamples getContextSamplesFor(const DILocation DIL);
	// Query context profile for a given sample contxt of a function.			// Query context profile for a given sample contxt of a function.
	FunctionSamples *getContextSamplesFor(const SampleContext &Context);			FunctionSamples *getContextSamplesFor(const SampleContext &Context);
	// Get all context profile for given function.			// Get all context profile for given function.
	ContextSamplesTy &getAllContextSamplesFor(const Function &Func);			ContextSamplesTy &getAllContextSamplesFor(const Function &Func);
	ContextSamplesTy &getAllContextSamplesFor(StringRef Name);			ContextSamplesTy &getAllContextSamplesFor(uint64_t GUID);
	// Query base profile for a given function. A base profile is a merged view			// Query base profile for a given function. A base profile is a merged view
	// of all context profiles for contexts that are not inlined.			// of all context profiles for contexts that are not inlined.
	FunctionSamples *getBaseSamplesFor(const Function &Func,			FunctionSamples *getBaseSamplesFor(const Function &Func,
	bool MergeContext = true);			bool MergeContext = true);
	// Query base profile for a given function by name.			// Query base profile for a given function by GUID.
	FunctionSamples *getBaseSamplesFor(StringRef Name, bool MergeContext = true);			FunctionSamples *getBaseSamplesFor(uint64_t GUID, bool MergeContext = true);
	// Retrieve the context trie node for given profile context			// Retrieve the context trie node for given profile context
	ContextTrieNode *getContextFor(const SampleContext &Context);			ContextTrieNode *getContextFor(const SampleContext &Context);
	// Mark a context profile as inlined when function is inlined.			// Mark a context profile as inlined when function is inlined.
	// This makes sure that inlined context profile will be excluded in			// This makes sure that inlined context profile will be excluded in
	// function's base profile.			// function's base profile.
	void markContextSamplesInlined(const FunctionSamples *InlinedSamples);			void markContextSamplesInlined(const FunctionSamples *InlinedSamples);
	ContextTrieNode &getRootContext();			ContextTrieNode &getRootContext();
	void promoteMergeContextSamplesTree(const Instruction &Inst,			void promoteMergeContextSamplesTree(const Instruction &Inst,
	StringRef CalleeName);			uint64_t CalleeGUID);
				StringRef getFuncNameFor(ContextTrieNode *Node) const;
				std::string getContextInRealName(SampleContext &Context);
	// Dump the internal context profile trie.			// Dump the internal context profile trie.
	void dump();			void dump();

	private:			private:
	ContextTrieNode getContextFor(const DILocation DIL);			ContextTrieNode getContextFor(const DILocation DIL);
	ContextTrieNode getCalleeContextFor(const DILocation DIL,			ContextTrieNode getCalleeContextFor(const DILocation DIL,
	StringRef CalleeName);			uint64_t CalleeGUID);
	ContextTrieNode *getOrCreateContextPath(const SampleContext &Context,			ContextTrieNode *getOrCreateContextPath(const SampleContext &Context,
	bool AllowCreate);			bool AllowCreate);
	ContextTrieNode *getTopLevelContextNode(StringRef FName);			ContextTrieNode *getTopLevelContextNode(uint64_t GUID);
	ContextTrieNode &addTopLevelContextNode(StringRef FName);			ContextTrieNode &addTopLevelContextNode(uint64_t GUID);
	ContextTrieNode &promoteMergeContextSamplesTree(ContextTrieNode &NodeToPromo);			ContextTrieNode &promoteMergeContextSamplesTree(ContextTrieNode &NodeToPromo);
	void mergeContextNode(ContextTrieNode &FromNode, ContextTrieNode &ToNode,			void mergeContextNode(ContextTrieNode &FromNode, ContextTrieNode &ToNode,
	StringRef ContextStrToRemove);			StringRef ContextStrToRemove);
	ContextTrieNode &promoteMergeContextSamplesTree(ContextTrieNode &FromNode,			ContextTrieNode &promoteMergeContextSamplesTree(ContextTrieNode &FromNode,
	ContextTrieNode &ToNodeParent,			ContextTrieNode &ToNodeParent,
	StringRef ContextStrToRemove);			StringRef ContextStrToRemove);

	// Map from function name to context profiles (excluding base profile)			// Map from function name to context profiles (excluding base profile)
	StringMap<ContextSamplesTy> FuncToCtxtProfiles;			std::unordered_map<uint64_t, ContextSamplesTy> FuncToCtxtProfiles;

				DenseMap<uint64_t, StringRef> &GUIDToFuncNameMap;

	// Root node for context trie tree			// Root node for context trie tree
	ContextTrieNode RootContext;			ContextTrieNode RootContext;
	};			};

	} // end namespace llvm			} // end namespace llvm
	#endif // LLVM_TRANSFORMS_IPO_SAMPLECONTEXTTRACKER_H			#endif // LLVM_TRANSFORMS_IPO_SAMPLECONTEXTTRACKER_H

llvm/lib/ProfileData/SampleProf.cpp

	Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
	#endif			#endif

	raw_ostream &llvm::sampleprof::operator<<(raw_ostream &OS,			raw_ostream &llvm::sampleprof::operator<<(raw_ostream &OS,
	const SampleRecord &Sample) {			const SampleRecord &Sample) {
	Sample.print(OS, 0);			Sample.print(OS, 0);
	return OS;			return OS;
	}			}

				std::string SampleContext::getContextInMd5() const {
				std::string Md5Context;
				StringRef ContextRemain = FullContext;
				StringRef ChildContext;
				StringRef CalleeName;
				while (!ContextRemain.empty()) {
				auto ContextSplit = SampleContext::splitContextString(ContextRemain);
				ChildContext = ContextSplit.first;
				ContextRemain = ContextSplit.second;
				LineLocation NextCallSiteLoc(0, 0);
				SampleContext::decodeContextString(ChildContext, CalleeName,
				NextCallSiteLoc);
				auto CalleeGUID = FunctionSamples::getGUID(CalleeName);
				if (!Md5Context.empty())
				Md5Context += " @ ";
				Md5Context += "0x" + Twine::utohexstr(CalleeGUID).str();
				if (NextCallSiteLoc.LineOffset)
				Md5Context += ":" + Twine(NextCallSiteLoc.LineOffset).str();
				if (NextCallSiteLoc.Discriminator)
				Md5Context += "." + Twine(NextCallSiteLoc.Discriminator).str();
				}
				return Md5Context;
				}

				uint64_t SampleContext::getGUID() const {
				return FunctionSamples::getGUID(Name);
				}

	/// Print the samples collected for a function on stream \p OS.			/// Print the samples collected for a function on stream \p OS.
	void FunctionSamples::print(raw_ostream &OS, unsigned Indent) const {			void FunctionSamples::print(raw_ostream &OS, unsigned Indent) const {
	if (getFunctionHash())			if (getFunctionHash())
	OS << "CFG checksum " << getFunctionHash() << "\n";			OS << "CFG checksum " << getFunctionHash() << "\n";

	OS << TotalSamples << ", " << TotalHeadSamples << ", " << BodySamples.size()			OS << TotalSamples << ", " << TotalHeadSamples << ", " << BodySamples.size()
	<< " sampled lines\n";			<< " sampled lines\n";

	▲ Show 20 Lines • Show All 270 Lines • Show Last 20 Lines

llvm/lib/ProfileData/SampleProfReader.cpp

Show First 20 Lines • Show All 741 Lines • ▼ Show 20 Lines	if (useMD5()) {
return L.substr(0, L.size() - 1) < R.substr(0, R.size() - 1);		return L.substr(0, L.size() - 1) < R.substr(0, R.size() - 1);
}		}
};		};
std::set<StringRef, Comparer> OrderedNames;		std::set<StringRef, Comparer> OrderedNames;
for (auto Name : FuncOffsetTable) {		for (auto Name : FuncOffsetTable) {
OrderedNames.insert(Name.first);		OrderedNames.insert(Name.first);
}		}

		DenseSet<uint64_t> FuncGuidsToUse;
		for (auto Name : FuncsToUse)
		FuncGuidsToUse.insert(Function::getGUID(Name));

// For each function in current module, load all		// For each function in current module, load all
// context profiles for the function.		// context profiles for the function.
for (auto NameOffset : FuncOffsetTable) {		for (auto NameOffset : FuncOffsetTable) {
StringRef ContextName = NameOffset.first;		StringRef ContextName = NameOffset.first;
SampleContext FContext(ContextName);		SampleContext FContext(ContextName);
auto FuncName = FContext.getNameWithoutContext();		if (!FuncGuidsToUse.count(FContext.getGUID()))
if (!FuncsToUse.count(FuncName) &&
(!Remapper \|\| !Remapper->exist(FuncName)))
continue;		continue;

// For each context profile we need, try to load		// For each context profile we need, try to load
// all context profile in the subtree. This can		// all context profile in the subtree. This can
// help profile guided importing for ThinLTO.		// help profile guided importing for ThinLTO.
auto It = OrderedNames.find(ContextName);		auto It = OrderedNames.find(ContextName);
while (It != OrderedNames.end() &&		while (It != OrderedNames.end() &&
It->startswith(ContextName.substr(0, ContextName.size() - 1))) {		It->startswith(ContextName.substr(0, ContextName.size() - 1))) {
▲ Show 20 Lines • Show All 977 Lines • Show Last 20 Lines

llvm/lib/ProfileData/SampleProfWriter.cpp

Show First 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	if (FunctionSamples::ProfileIsProbeBased)
encodeULEB128(Entry.second.getFunctionHash(), OS);		encodeULEB128(Entry.second.getFunctionHash(), OS);
if (FunctionSamples::ProfileIsCS)		if (FunctionSamples::ProfileIsCS)
encodeULEB128(Entry.second.getContext().getAllAttributes(), OS);		encodeULEB128(Entry.second.getContext().getAllAttributes(), OS);
}		}
return sampleprof_error::success;		return sampleprof_error::success;
}		}

std::error_code SampleProfileWriterExtBinaryBase::writeNameTable() {		std::error_code SampleProfileWriterExtBinaryBase::writeNameTable() {
if (!UseMD5)		// The name table of MD5-based CS profile is handled same way with non-MD5.
		if (!UseMD5 \|\| FunctionSamples::ProfileIsCS)
return SampleProfileWriterBinary::writeNameTable();		return SampleProfileWriterBinary::writeNameTable();

auto &OS = *OutputStream;		auto &OS = *OutputStream;
std::set<StringRef> V;		std::set<StringRef> V;
stablizeNameTable(V);		stablizeNameTable(V);

// Write out the MD5 name table. We wrote unencoded MD5 so reader can		// Write out the MD5 name table. We wrote unencoded MD5 so reader can
// retrieve the name using the name index without having to read the		// retrieve the name using the name index without having to read the
Show All 9 Lines	std::error_code SampleProfileWriterExtBinaryBase::writeNameTableSection(
const StringMap<FunctionSamples> &ProfileMap) {		const StringMap<FunctionSamples> &ProfileMap) {
for (const auto &I : ProfileMap) {		for (const auto &I : ProfileMap) {
assert(I.first() == I.second.getNameWithContext() &&		assert(I.first() == I.second.getNameWithContext() &&
"Inconsistent profile map");		"Inconsistent profile map");
addName(I.second.getNameWithContext(), FunctionSamples::ProfileIsCS);		addName(I.second.getNameWithContext(), FunctionSamples::ProfileIsCS);
addNames(I.second);		addNames(I.second);
}		}

		if (FunctionSamples::ProfileIsCS) {
		// Since there's no way to tell if md5 names have the ".__uniq." suffix or
		// not, the flag is set to prevent the compiler stripping the suffix.
		addSectionFlag(SecNameTable, SecNameTableFlags::SecFlagUniqSuffix);
		} else {
// If NameTable contains ".__uniq." suffix, set SecFlagUniqSuffix flag		// If NameTable contains ".__uniq." suffix, set SecFlagUniqSuffix flag
// so compiler won't strip the suffix during profile matching after		// so compiler won't strip the suffix during profile matching after
// seeing the flag in the profile.		// seeing the flag in the profile.
for (const auto &I : NameTable) {		for (const auto &I : NameTable) {
if (I.first.find(FunctionSamples::UniqSuffix) != StringRef::npos) {		if (I.first.find(FunctionSamples::UniqSuffix) != StringRef::npos) {
addSectionFlag(SecNameTable, SecNameTableFlags::SecFlagUniqSuffix);		addSectionFlag(SecNameTable, SecNameTableFlags::SecFlagUniqSuffix);
break;		break;
}		}
}		}
		}

if (auto EC = writeNameTable())		if (auto EC = writeNameTable())
return EC;		return EC;
return sampleprof_error::success;		return sampleprof_error::success;
}		}

std::error_code		std::error_code
SampleProfileWriterExtBinaryBase::writeProfileSymbolListSection() {		SampleProfileWriterExtBinaryBase::writeProfileSymbolListSection() {
▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines
/// Note: it may be tempting to implement this in terms of		/// Note: it may be tempting to implement this in terms of
/// FunctionSamples::print(). Please don't. The dump functionality is intended		/// FunctionSamples::print(). Please don't. The dump functionality is intended
/// for debugging and has no specified form.		/// for debugging and has no specified form.
///		///
/// The format used here is more structured and deliberate because		/// The format used here is more structured and deliberate because
/// it needs to be parsed by the SampleProfileReaderText class.		/// it needs to be parsed by the SampleProfileReaderText class.
std::error_code SampleProfileWriterText::writeSample(const FunctionSamples &S) {		std::error_code SampleProfileWriterText::writeSample(const FunctionSamples &S) {
auto &OS = *OutputStream;		auto &OS = *OutputStream;
if (FunctionSamples::ProfileIsCS)		if (FunctionSamples::ProfileIsCS) {
OS << "[" << S.getNameWithContext() << "]:" << S.getTotalSamples();		if (FunctionSamples::UseMD5)
		OS << "[" << S.getContext().getContextInMd5()
		<< "]:" << S.getTotalSamples();
else		else
		OS << "[" << S.getNameWithContext() << "]:" << S.getTotalSamples();
		} else
OS << S.getName() << ":" << S.getTotalSamples();		OS << S.getName() << ":" << S.getTotalSamples();

if (Indent == 0)		if (Indent == 0)
OS << ":" << S.getHeadSamples();		OS << ":" << S.getHeadSamples();
OS << "\n";		OS << "\n";

SampleSorter<LineLocation, SampleRecord> SortedSamples(S.getBodySamples());		SampleSorter<LineLocation, SampleRecord> SortedSamples(S.getBodySamples());
for (const auto &I : SortedSamples.get()) {		for (const auto &I : SortedSamples.get()) {
LineLocation Loc = I->first;		LineLocation Loc = I->first;
const SampleRecord &Sample = I->second;		const SampleRecord &Sample = I->second;
OS.indent(Indent + 1);		OS.indent(Indent + 1);
if (Loc.Discriminator == 0)		if (Loc.Discriminator == 0)
OS << Loc.LineOffset << ": ";		OS << Loc.LineOffset << ": ";
else		else
OS << Loc.LineOffset << "." << Loc.Discriminator << ": ";		OS << Loc.LineOffset << "." << Loc.Discriminator << ": ";

OS << Sample.getSamples();		OS << Sample.getSamples();

		if (FunctionSamples::ProfileIsCS && FunctionSamples::UseMD5) {
		for (const auto &J : Sample.getSortedCallTargets())
		OS << " 0x" << Twine::utohexstr(FunctionSamples::getGUID(J.first))
		<< ":" << J.second;
		} else {
for (const auto &J : Sample.getSortedCallTargets())		for (const auto &J : Sample.getSortedCallTargets())
OS << " " << J.first << ":" << J.second;		OS << " " << J.first << ":" << J.second;
		}
OS << "\n";		OS << "\n";
}		}

SampleSorter<LineLocation, FunctionSamplesMap> SortedCallsiteSamples(		SampleSorter<LineLocation, FunctionSamplesMap> SortedCallsiteSamples(
S.getCallsiteSamples());		S.getCallsiteSamples());
Indent += 1;		Indent += 1;
for (const auto &I : SortedCallsiteSamples.get())		for (const auto &I : SortedCallsiteSamples.get())
for (const auto &FS : I->second) {		for (const auto &FS : I->second) {
Show All 21 Lines	std::error_code SampleProfileWriterText::writeSample(const FunctionSamples &S) {
}		}

return sampleprof_error::success;		return sampleprof_error::success;
}		}

std::error_code SampleProfileWriterBinary::writeNameIdx(StringRef FName,		std::error_code SampleProfileWriterBinary::writeNameIdx(StringRef FName,
bool IsContextName) {		bool IsContextName) {
std::string BracketedName;		std::string BracketedName;
		if (FunctionSamples::ProfileIsCS && FunctionSamples::UseMD5) {
		SampleContext Context(FName);
		BracketedName = Context.getContextInMd5();
		FName = BracketedName;
		}

if (IsContextName) {		if (IsContextName) {
BracketedName = "[" + FName.str() + "]";		BracketedName = "[" + FName.str() + "]";
FName = StringRef(BracketedName);		FName = BracketedName;
}		}

const auto &Ret = NameTable.find(FName);		const auto &Ret = NameTable.find(FName);
if (Ret == NameTable.end())		if (Ret == NameTable.end())
return sampleprof_error::truncated_name_table;		return sampleprof_error::truncated_name_table;
encodeULEB128(Ret->second, *OutputStream);		encodeULEB128(Ret->second, *OutputStream);
return sampleprof_error::success;		return sampleprof_error::success;
}		}

void SampleProfileWriterBinary::addName(StringRef FName, bool IsContextName) {		void SampleProfileWriterBinary::addName(StringRef FName, bool IsContextName) {
		std::string BracketedName;
		if (FunctionSamples::ProfileIsCS && FunctionSamples::UseMD5) {
		SampleContext Context(FName);
		BracketedName = Context.getContextInMd5();
		FName = BracketedName;
		}
if (IsContextName) {		if (IsContextName) {
auto It = BracketedContextStr.insert("[" + FName.str() + "]");		auto It = BracketedContextStr.insert("[" + FName.str() + "]");
FName = StringRef(*It.first);		FName = StringRef(*It.first);
		} else if (!BracketedName.empty()) {
		auto It = BracketedContextStr.insert(BracketedName);
		FName = StringRef(*It.first);
}		}
NameTable.insert(std::make_pair(FName, 0));		NameTable.insert(std::make_pair(FName, 0));
}		}

void SampleProfileWriterBinary::addNames(const FunctionSamples &S) {		void SampleProfileWriterBinary::addNames(const FunctionSamples &S) {
// Add all the names in indirect call targets.		// Add all the names in indirect call targets.
for (const auto &I : S.getBodySamples()) {		for (const auto &I : S.getBodySamples()) {
const SampleRecord &Sample = I.second;		const SampleRecord &Sample = I.second;
▲ Show 20 Lines • Show All 336 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/SampleContextTracker.cpp

Show All 22 Lines
using namespace llvm;		using namespace llvm;
using namespace sampleprof;		using namespace sampleprof;

#define DEBUG_TYPE "sample-context-tracker"		#define DEBUG_TYPE "sample-context-tracker"

namespace llvm {		namespace llvm {

ContextTrieNode *ContextTrieNode::getChildContext(const LineLocation &CallSite,		ContextTrieNode *ContextTrieNode::getChildContext(const LineLocation &CallSite,
StringRef CalleeName) {		uint64_t CalleeGUID) {
if (CalleeName.empty())		if (!CalleeGUID)
return getHottestChildContext(CallSite);		return getHottestChildContext(CallSite);

uint32_t Hash = nodeHash(CalleeName, CallSite);		uint64_t Hash = nodeHash(CalleeGUID, CallSite);
auto It = AllChildContext.find(Hash);		auto It = AllChildContext.find(Hash);
if (It != AllChildContext.end())		if (It != AllChildContext.end())
return &It->second;		return &It->second;
return nullptr;		return nullptr;
}		}

ContextTrieNode *		ContextTrieNode *
ContextTrieNode::getHottestChildContext(const LineLocation &CallSite) {		ContextTrieNode::getHottestChildContext(const LineLocation &CallSite) {
Show All 15 Lines	for (auto &It : AllChildContext) {
}		}
}		}

return ChildNodeRet;		return ChildNodeRet;
}		}

ContextTrieNode &ContextTrieNode::moveToChildContext(		ContextTrieNode &ContextTrieNode::moveToChildContext(
const LineLocation &CallSite, ContextTrieNode &&NodeToMove,		const LineLocation &CallSite, ContextTrieNode &&NodeToMove,
StringRef ContextStrToRemove, bool DeleteNode) {		StringRef ContextStrToRemove, SampleContextTracker &ContextTracker,
uint32_t Hash = nodeHash(NodeToMove.getFuncName(), CallSite);		bool DeleteNode) {
		auto Hash = nodeHash(NodeToMove.getFuncGUID(), CallSite);
assert(!AllChildContext.count(Hash) && "Node to remove must exist");		assert(!AllChildContext.count(Hash) && "Node to remove must exist");
LineLocation OldCallSite = NodeToMove.CallSiteLoc;		LineLocation OldCallSite = NodeToMove.CallSiteLoc;
ContextTrieNode &OldParentContext = *NodeToMove.getParentContext();		ContextTrieNode &OldParentContext = *NodeToMove.getParentContext();
AllChildContext[Hash] = NodeToMove;		AllChildContext[Hash] = NodeToMove;
ContextTrieNode &NewNode = AllChildContext[Hash];		ContextTrieNode &NewNode = AllChildContext[Hash];
NewNode.CallSiteLoc = CallSite;		NewNode.CallSiteLoc = CallSite;

// Walk through nodes in the moved the subtree, and update		// Walk through nodes in the moved the subtree, and update
// FunctionSamples' context as for the context promotion.		// FunctionSamples' context as for the context promotion.
// We also need to set new parant link for all children.		// We also need to set new parant link for all children.
std::queue<ContextTrieNode *> NodeToUpdate;		std::queue<ContextTrieNode *> NodeToUpdate;
NewNode.setParentContext(this);		NewNode.setParentContext(this);
NodeToUpdate.push(&NewNode);		NodeToUpdate.push(&NewNode);

while (!NodeToUpdate.empty()) {		while (!NodeToUpdate.empty()) {
ContextTrieNode *Node = NodeToUpdate.front();		ContextTrieNode *Node = NodeToUpdate.front();
NodeToUpdate.pop();		NodeToUpdate.pop();
FunctionSamples *FSamples = Node->getFunctionSamples();		FunctionSamples *FSamples = Node->getFunctionSamples();

if (FSamples) {		if (FSamples) {
FSamples->getContext().promoteOnPath(ContextStrToRemove);		FSamples->getContext().promoteOnPath(ContextStrToRemove);
FSamples->getContext().setState(SyntheticContext);		FSamples->getContext().setState(SyntheticContext);
LLVM_DEBUG(dbgs() << " Context promoted to: " << FSamples->getContext()		LLVM_DEBUG(
		dbgs() << " Context promoted to: "
		<< ContextTracker.getContextInRealName(FSamples->getContext())
<< "\n");		<< "\n");
}		}

for (auto &It : Node->getAllChildContext()) {		for (auto &It : Node->getAllChildContext()) {
ContextTrieNode *ChildNode = &It.second;		ContextTrieNode *ChildNode = &It.second;
ChildNode->setParentContext(Node);		ChildNode->setParentContext(Node);
NodeToUpdate.push(ChildNode);		NodeToUpdate.push(ChildNode);
}		}
}		}

// Original context no longer needed, destroy if requested.		// Original context no longer needed, destroy if requested.
if (DeleteNode)		if (DeleteNode)
OldParentContext.removeChildContext(OldCallSite, NewNode.getFuncName());		OldParentContext.removeChildContext(OldCallSite, NewNode.getFuncGUID());

return NewNode;		return NewNode;
}		}

void ContextTrieNode::removeChildContext(const LineLocation &CallSite,		void ContextTrieNode::removeChildContext(const LineLocation &CallSite,
StringRef CalleeName) {		uint64_t CalleeGUID) {
uint32_t Hash = nodeHash(CalleeName, CallSite);		uint64_t Hash = nodeHash(CalleeGUID, CallSite);
// Note this essentially calls dtor and destroys that child context		// Note this essentially calls dtor and destroys that child context
AllChildContext.erase(Hash);		AllChildContext.erase(Hash);
}		}

std::map<uint32_t, ContextTrieNode> &ContextTrieNode::getAllChildContext() {		std::map<uint64_t, ContextTrieNode> &ContextTrieNode::getAllChildContext() {
return AllChildContext;		return AllChildContext;
}		}

StringRef ContextTrieNode::getFuncName() const { return FuncName; }		uint64_t ContextTrieNode::getFuncGUID() const { return GUID; }

FunctionSamples *ContextTrieNode::getFunctionSamples() const {		FunctionSamples *ContextTrieNode::getFunctionSamples() const {
return FuncSamples;		return FuncSamples;
}		}

void ContextTrieNode::setFunctionSamples(FunctionSamples *FSamples) {		void ContextTrieNode::setFunctionSamples(FunctionSamples *FSamples) {
FuncSamples = FSamples;		FuncSamples = FSamples;
}		}

LineLocation ContextTrieNode::getCallSiteLoc() const { return CallSiteLoc; }		LineLocation ContextTrieNode::getCallSiteLoc() const { return CallSiteLoc; }

ContextTrieNode *ContextTrieNode::getParentContext() const {		ContextTrieNode *ContextTrieNode::getParentContext() const {
return ParentContext;		return ParentContext;
}		}

void ContextTrieNode::setParentContext(ContextTrieNode *Parent) {		void ContextTrieNode::setParentContext(ContextTrieNode *Parent) {
ParentContext = Parent;		ParentContext = Parent;
}		}

void ContextTrieNode::dump() {		void ContextTrieNode::dump(SampleContextTracker &ContextTracker) {
dbgs() << "Node: " << FuncName << "\n"		dbgs() << "Node: " << ContextTracker.getFuncNameFor(this) << "\n"
<< " Callsite: " << CallSiteLoc << "\n"		<< " Callsite: " << CallSiteLoc << "\n"
<< " Children:\n";		<< " Address: " << format("%8" PRIx64, this) << "\n"
		<< " Parent: " << format("%8" PRIx64, ParentContext) << "\n";

		if (FuncSamples)
		dbgs() << " HasProfile\n";

		dbgs() << " Children:\n";

for (auto &It : AllChildContext) {		for (auto &It : AllChildContext) {
dbgs() << " Node: " << It.second.getFuncName() << "\n";		dbgs() << " Node: " << ContextTracker.getFuncNameFor(&It.second) << "\n";
}		}
}		}

uint32_t ContextTrieNode::nodeHash(StringRef ChildName,		uint64_t ContextTrieNode::nodeHash(uint64_t ChildGUID,
const LineLocation &Callsite) {		const LineLocation &Callsite) {
// We still use child's name for child hash, this is		// We still use child's name for child hash, this is
// because for children of root node, we don't have		// because for children of root node, we don't have
// different line/discriminator, and we'll rely on name		// different line/discriminator, and we'll rely on name
// to differentiate children.		// to differentiate children.
uint32_t NameHash = std::hash<std::string>{}(ChildName.str());
uint32_t LocId = (Callsite.LineOffset << 16) \| Callsite.Discriminator;		uint32_t LocId = (Callsite.LineOffset << 16) \| Callsite.Discriminator;
return NameHash + (LocId << 5) + LocId;		return ChildGUID + (LocId << 5) + LocId;
}		}

ContextTrieNode *ContextTrieNode::getOrCreateChildContext(		ContextTrieNode *ContextTrieNode::getOrCreateChildContext(
const LineLocation &CallSite, StringRef CalleeName, bool AllowCreate) {		const LineLocation &CallSite, uint64_t CalleeGUID, bool AllowCreate) {
uint32_t Hash = nodeHash(CalleeName, CallSite);		uint64_t Hash = nodeHash(CalleeGUID, CallSite);
auto It = AllChildContext.find(Hash);		auto It = AllChildContext.find(Hash);
if (It != AllChildContext.end()) {		if (It != AllChildContext.end()) {
assert(It->second.getFuncName() == CalleeName &&		assert(It->second.getFuncGUID() == CalleeGUID &&
"Hash collision for child context node");		"Hash collision for child context node");
return &It->second;		return &It->second;
}		}

if (!AllowCreate)		if (!AllowCreate)
return nullptr;		return nullptr;

AllChildContext[Hash] = ContextTrieNode(this, CalleeName, nullptr, CallSite);		AllChildContext[Hash] = ContextTrieNode(this, CalleeGUID, nullptr, CallSite);
return &AllChildContext[Hash];		return &AllChildContext[Hash];
}		}

		std::string SampleContextTracker::getContextInRealName(SampleContext &Context) {
		std::string RealContext;
		StringRef ContextRemain = Context;
		StringRef ChildContext;
		StringRef CalleeName;
		while (!ContextRemain.empty()) {
		auto ContextSplit = SampleContext::splitContextString(ContextRemain);
		ChildContext = ContextSplit.first;
		ContextRemain = ContextSplit.second;
		LineLocation NextCallSiteLoc(0, 0);
		SampleContext::decodeContextString(ChildContext, CalleeName,
		NextCallSiteLoc);
		if (FunctionSamples::isGUID(CalleeName))
		CalleeName = GUIDToFuncNameMap[FunctionSamples::getGUID(CalleeName)];
		if (!RealContext.empty())
		RealContext += " @ ";
		RealContext += CalleeName;
		if (NextCallSiteLoc.LineOffset)
		RealContext += ":" + Twine(NextCallSiteLoc.LineOffset).str();
		if (NextCallSiteLoc.Discriminator)
		RealContext += "." + Twine(NextCallSiteLoc.Discriminator).str();
		}
		return RealContext;
		}

// Profiler tracker than manages profiles and its associated context		// Profiler tracker than manages profiles and its associated context
SampleContextTracker::SampleContextTracker(		SampleContextTracker::SampleContextTracker(
StringMap<FunctionSamples> &Profiles) {		StringMap<FunctionSamples> &Profiles,
		DenseMap<uint64_t, StringRef> &GUIDToFuncNameMap)
		: GUIDToFuncNameMap(GUIDToFuncNameMap) {
for (auto &FuncSample : Profiles) {		for (auto &FuncSample : Profiles) {
FunctionSamples *FSamples = &FuncSample.second;		FunctionSamples *FSamples = &FuncSample.second;
SampleContext Context(FuncSample.first(), RawContext);		SampleContext Context(FuncSample.first(), RawContext);
LLVM_DEBUG(dbgs() << "Tracking Context for function: " << Context << "\n");		LLVM_DEBUG(dbgs() << "Tracking Context for function: "
if (!Context.isBaseContext())		<< getContextInRealName(Context) << "\n");
FuncToCtxtProfiles[Context.getNameWithoutContext()].push_back(FSamples);		if (!Context.isBaseContext()) {
		auto Hash = MD5Hash(Context.getContextInMd5());
		hoyAuthorUnsubmitted Done Reply Inline Actions The ordering based `Hash` in `FuncToCtxtProfiles` is mainly to achieve a consistent context promotion between md5 and non-md5 profiles which in turn gives a consistent codegen. However it is expansive. I tried sorting by the the combination of total sample counts and head sample counts, but still could not get every case covered. I think we might want to do this for non-md5 profile only, to favor md5 performance. hoy: The ordering based `Hash` in `FuncToCtxtProfiles` is mainly to achieve a consistent context…
		wenleiUnsubmitted Not Done Reply Inline Actions On the cost of hashing strings, if we using a std::set<FunctionSamples , ..>, with comparer that checks total samples order and md5 string order sequentially, we would have stable order, with low cost as md5 string order is just a fall back that's rarely used, right? wenlei:* On the cost of hashing strings, if we using a std::set<FunctionSamples *, ..>, with comparer…
		FuncToCtxtProfiles[Context.getGUID()].emplace(Hash, FSamples);
		}
ContextTrieNode *NewNode = getOrCreateContextPath(Context, true);		ContextTrieNode *NewNode = getOrCreateContextPath(Context, true);
assert(!NewNode->getFunctionSamples() &&		assert(!NewNode->getFunctionSamples() &&
"New node can't have sample profile");		"New node can't have sample profile");
NewNode->setFunctionSamples(FSamples);		NewNode->setFunctionSamples(FSamples);
}		}
}		}

FunctionSamples *		FunctionSamples *
SampleContextTracker::getCalleeContextSamplesFor(const CallBase &Inst,		SampleContextTracker::getCalleeContextSamplesFor(const CallBase &Inst,
StringRef CalleeName) {		uint64_t CalleeGUID) {
LLVM_DEBUG(dbgs() << "Getting callee context for instr: " << Inst << "\n");		LLVM_DEBUG(dbgs() << "Getting callee context for instr: " << Inst << "\n");
DILocation *DIL = Inst.getDebugLoc();		DILocation *DIL = Inst.getDebugLoc();
if (!DIL)		if (!DIL)
return nullptr;		return nullptr;

CalleeName = FunctionSamples::getCanonicalFnName(CalleeName);

// For indirect call, CalleeName will be empty, in which case the context		// For indirect call, CalleeName will be empty, in which case the context
// profile for callee with largest total samples will be returned.		// profile for callee with largest total samples will be returned.
ContextTrieNode *CalleeContext = getCalleeContextFor(DIL, CalleeName);		ContextTrieNode *CalleeContext = getCalleeContextFor(DIL, CalleeGUID);
if (CalleeContext) {		if (CalleeContext) {
FunctionSamples *FSamples = CalleeContext->getFunctionSamples();		FunctionSamples *FSamples = CalleeContext->getFunctionSamples();
LLVM_DEBUG(if (FSamples) {		LLVM_DEBUG(if (FSamples) {
dbgs() << " Callee context found: " << FSamples->getContext() << "\n";		dbgs() << " Callee context found: "
		<< getContextInRealName(FSamples->getContext()) << "\n";
});		});
return FSamples;		return FSamples;
}		}

return nullptr;		return nullptr;
}		}

std::vector<const FunctionSamples *>		std::vector<const FunctionSamples *>
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	if (!Node)
return nullptr;		return nullptr;

return Node->getFunctionSamples();		return Node->getFunctionSamples();
}		}

SampleContextTracker::ContextSamplesTy &		SampleContextTracker::ContextSamplesTy &
SampleContextTracker::getAllContextSamplesFor(const Function &Func) {		SampleContextTracker::getAllContextSamplesFor(const Function &Func) {
StringRef CanonName = FunctionSamples::getCanonicalFnName(Func);		StringRef CanonName = FunctionSamples::getCanonicalFnName(Func);
return FuncToCtxtProfiles[CanonName];		uint64_t GUID = Function::getGUID(CanonName);
		return FuncToCtxtProfiles[GUID];
}		}

SampleContextTracker::ContextSamplesTy &		SampleContextTracker::ContextSamplesTy &
SampleContextTracker::getAllContextSamplesFor(StringRef Name) {		SampleContextTracker::getAllContextSamplesFor(uint64_t GUID) {
return FuncToCtxtProfiles[Name];		return FuncToCtxtProfiles[GUID];
}		}

FunctionSamples *SampleContextTracker::getBaseSamplesFor(const Function &Func,		FunctionSamples *SampleContextTracker::getBaseSamplesFor(const Function &Func,
bool MergeContext) {		bool MergeContext) {
StringRef CanonName = FunctionSamples::getCanonicalFnName(Func);		StringRef CanonName = FunctionSamples::getCanonicalFnName(Func);
return getBaseSamplesFor(CanonName, MergeContext);		uint64_t GUID = Function::getGUID(CanonName);
		return getBaseSamplesFor(GUID, MergeContext);
}		}

FunctionSamples *SampleContextTracker::getBaseSamplesFor(StringRef Name,		FunctionSamples *SampleContextTracker::getBaseSamplesFor(uint64_t GUID,
bool MergeContext) {		bool MergeContext) {
LLVM_DEBUG(dbgs() << "Getting base profile for function: " << Name << "\n");		LLVM_DEBUG(dbgs() << "Getting base profile for function: "
		<< GUIDToFuncNameMap[GUID] << "\n");

// Base profile is top-level node (child of root node), so try to retrieve		// Base profile is top-level node (child of root node), so try to retrieve
// existing top-level node for given function first. If it exists, it could be		// existing top-level node for given function first. If it exists, it could be
// that we've merged base profile before, or there's actually context-less		// that we've merged base profile before, or there's actually context-less
// profile from the input (e.g. due to unreliable stack walking).		// profile from the input (e.g. due to unreliable stack walking).
ContextTrieNode *Node = getTopLevelContextNode(Name);		ContextTrieNode *Node = getTopLevelContextNode(GUID);
if (MergeContext) {		if (MergeContext) {
LLVM_DEBUG(dbgs() << " Merging context profile into base profile: " << Name		LLVM_DEBUG(dbgs() << " Merging context profile into base profile: "
<< "\n");		<< GUIDToFuncNameMap[GUID] << "\n");

// We have profile for function under different contexts,		// We have profile for function under different contexts,
// create synthetic base profile and merge context profiles		// create synthetic base profile and merge context profiles
// into base profile.		// into base profile.
for (auto *CSamples : FuncToCtxtProfiles[Name]) {		for (auto &CSamples : FuncToCtxtProfiles[GUID]) {
SampleContext &Context = CSamples->getContext();		SampleContext &Context = CSamples.second->getContext();
ContextTrieNode *FromNode = getContextFor(Context);		ContextTrieNode *FromNode = getContextFor(Context);
if (FromNode == Node)		if (FromNode == Node)
continue;		continue;

// Skip inlined context profile and also don't re-merge any context		// Skip inlined context profile and also don't re-merge any context
if (Context.hasState(InlinedContext) \|\| Context.hasState(MergedContext))		if (Context.hasState(InlinedContext) \|\| Context.hasState(MergedContext))
continue;		continue;

Show All 16 Lines	void SampleContextTracker::markContextSamplesInlined(
LLVM_DEBUG(dbgs() << "Marking context profile as inlined: "		LLVM_DEBUG(dbgs() << "Marking context profile as inlined: "
<< InlinedSamples->getContext() << "\n");		<< InlinedSamples->getContext() << "\n");
InlinedSamples->getContext().setState(InlinedContext);		InlinedSamples->getContext().setState(InlinedContext);
}		}

ContextTrieNode &SampleContextTracker::getRootContext() { return RootContext; }		ContextTrieNode &SampleContextTracker::getRootContext() { return RootContext; }

void SampleContextTracker::promoteMergeContextSamplesTree(		void SampleContextTracker::promoteMergeContextSamplesTree(
const Instruction &Inst, StringRef CalleeName) {		const Instruction &Inst, uint64_t CalleeGUID) {
LLVM_DEBUG(dbgs() << "Promoting and merging context tree for instr: \n"		LLVM_DEBUG(dbgs() << "Promoting and merging context tree for instr: \n"
<< Inst << "\n");		<< Inst << "\n");
// Get the caller context for the call instruction, we don't use callee		// Get the caller context for the call instruction, we don't use callee
// name from call because there can be context from indirect calls too.		// name from call because there can be context from indirect calls too.
DILocation *DIL = Inst.getDebugLoc();		DILocation *DIL = Inst.getDebugLoc();
ContextTrieNode *CallerNode = getContextFor(DIL);		ContextTrieNode *CallerNode = getContextFor(DIL);
if (!CallerNode)		if (!CallerNode)
return;		return;

// Get the context that needs to be promoted		// Get the context that needs to be promoted
LineLocation CallSite = FunctionSamples::getCallSiteIdentifier(DIL);		LineLocation CallSite = FunctionSamples::getCallSiteIdentifier(DIL);
// For indirect call, CalleeName will be empty, in which case we need to		// For indirect call, CalleeName will be empty, in which case we need to
// promote all non-inlined child context profiles.		// promote all non-inlined child context profiles.
if (CalleeName.empty()) {		if (!CalleeGUID) {
for (auto &It : CallerNode->getAllChildContext()) {		for (auto &It : CallerNode->getAllChildContext()) {
ContextTrieNode *NodeToPromo = &It.second;		ContextTrieNode *NodeToPromo = &It.second;
if (CallSite != NodeToPromo->getCallSiteLoc())		if (CallSite != NodeToPromo->getCallSiteLoc())
continue;		continue;
FunctionSamples *FromSamples = NodeToPromo->getFunctionSamples();		FunctionSamples *FromSamples = NodeToPromo->getFunctionSamples();
if (FromSamples && FromSamples->getContext().hasState(InlinedContext))		if (FromSamples && FromSamples->getContext().hasState(InlinedContext))
continue;		continue;
promoteMergeContextSamplesTree(*NodeToPromo);		promoteMergeContextSamplesTree(*NodeToPromo);
}		}
return;		return;
}		}

// Get the context for the given callee that needs to be promoted		// Get the context for the given callee that needs to be promoted
ContextTrieNode *NodeToPromo =		ContextTrieNode *NodeToPromo =
CallerNode->getChildContext(CallSite, CalleeName);		CallerNode->getChildContext(CallSite, CalleeGUID);
if (!NodeToPromo)		if (!NodeToPromo)
return;		return;

promoteMergeContextSamplesTree(*NodeToPromo);		promoteMergeContextSamplesTree(*NodeToPromo);
}		}

ContextTrieNode &SampleContextTracker::promoteMergeContextSamplesTree(		ContextTrieNode &SampleContextTracker::promoteMergeContextSamplesTree(
ContextTrieNode &NodeToPromo) {		ContextTrieNode &NodeToPromo) {
// Promote the input node to be directly under root. This can happen		// Promote the input node to be directly under root. This can happen
// when we decided to not inline a function under context represented		// when we decided to not inline a function under context represented
// by the input node. The promote and merge is then needed to reflect		// by the input node. The promote and merge is then needed to reflect
// the context profile in the base (context-less) profile.		// the context profile in the base (context-less) profile.
FunctionSamples *FromSamples = NodeToPromo.getFunctionSamples();		FunctionSamples *FromSamples = NodeToPromo.getFunctionSamples();
assert(FromSamples && "Shouldn't promote a context without profile");		assert(FromSamples && "Shouldn't promote a context without profile");
LLVM_DEBUG(dbgs() << " Found context tree root to promote: "		LLVM_DEBUG(dbgs() << " Found context tree root to promote: "
<< FromSamples->getContext() << "\n");		<< getContextInRealName(FromSamples->getContext()) << "\n");

assert(!FromSamples->getContext().hasState(InlinedContext) &&		assert(!FromSamples->getContext().hasState(InlinedContext) &&
"Shouldn't promote inlined context profile");		"Shouldn't promote inlined context profile");
StringRef ContextStrToRemove = FromSamples->getContext().getCallingContext();		StringRef ContextStrToRemove = FromSamples->getContext().getCallingContext();
return promoteMergeContextSamplesTree(NodeToPromo, RootContext,		return promoteMergeContextSamplesTree(NodeToPromo, RootContext,
ContextStrToRemove);		ContextStrToRemove);
}		}

		StringRef SampleContextTracker::getFuncNameFor(ContextTrieNode *Node) const {
		return GUIDToFuncNameMap[Node->getFuncGUID()];
		}

void SampleContextTracker::dump() {		void SampleContextTracker::dump() {
dbgs() << "Context Profile Tree:\n";		dbgs() << "Context Profile Tree:\n";
std::queue<ContextTrieNode *> NodeQueue;		std::queue<ContextTrieNode *> NodeQueue;
NodeQueue.push(&RootContext);		NodeQueue.push(&RootContext);

while (!NodeQueue.empty()) {		while (!NodeQueue.empty()) {
ContextTrieNode *Node = NodeQueue.front();		ContextTrieNode *Node = NodeQueue.front();
NodeQueue.pop();		NodeQueue.pop();
Node->dump();		Node->dump(*this);

for (auto &It : Node->getAllChildContext()) {		for (auto &It : Node->getAllChildContext()) {
ContextTrieNode *ChildNode = &It.second;		ContextTrieNode *ChildNode = &It.second;
NodeQueue.push(ChildNode);		NodeQueue.push(ChildNode);
}		}
}		}
}		}

ContextTrieNode *		ContextTrieNode *
SampleContextTracker::getContextFor(const SampleContext &Context) {		SampleContextTracker::getContextFor(const SampleContext &Context) {
return getOrCreateContextPath(Context, false);		return getOrCreateContextPath(Context, false);
}		}

ContextTrieNode *		ContextTrieNode *
SampleContextTracker::getCalleeContextFor(const DILocation *DIL,		SampleContextTracker::getCalleeContextFor(const DILocation *DIL,
StringRef CalleeName) {		uint64_t CalleeGUID) {
assert(DIL && "Expect non-null location");		assert(DIL && "Expect non-null location");

ContextTrieNode *CallContext = getContextFor(DIL);		ContextTrieNode *CallContext = getContextFor(DIL);
if (!CallContext)		if (!CallContext)
return nullptr;		return nullptr;

// When CalleeName is empty, the child context profile with max		// When CalleeName is empty, the child context profile with max
// total samples will be returned.		// total samples will be returned.
return CallContext->getChildContext(		return CallContext->getChildContext(
FunctionSamples::getCallSiteIdentifier(DIL), CalleeName);		FunctionSamples::getCallSiteIdentifier(DIL), CalleeGUID);
}		}

ContextTrieNode SampleContextTracker::getContextFor(const DILocation DIL) {		ContextTrieNode SampleContextTracker::getContextFor(const DILocation DIL) {
assert(DIL && "Expect non-null location");		assert(DIL && "Expect non-null location");
SmallVector<std::pair<LineLocation, StringRef>, 10> S;		SmallVector<std::pair<LineLocation, uint64_t>, 10> S;

// Use C++ linkage name if possible.		// Use C++ linkage name if possible.
const DILocation *PrevDIL = DIL;		const DILocation *PrevDIL = DIL;
for (DIL = DIL->getInlinedAt(); DIL; DIL = DIL->getInlinedAt()) {		for (DIL = DIL->getInlinedAt(); DIL; DIL = DIL->getInlinedAt()) {
StringRef Name = PrevDIL->getScope()->getSubprogram()->getLinkageName();		StringRef Name = PrevDIL->getScope()->getSubprogram()->getLinkageName();
if (Name.empty())		if (Name.empty())
Name = PrevDIL->getScope()->getSubprogram()->getName();		Name = PrevDIL->getScope()->getSubprogram()->getName();
S.push_back(		S.push_back(std::make_pair(FunctionSamples::getCallSiteIdentifier(DIL),
std::make_pair(FunctionSamples::getCallSiteIdentifier(DIL), Name));		Function::getGUID(Name)));
PrevDIL = DIL;		PrevDIL = DIL;
}		}

// Push root node, note that root node like main may only		// Push root node, note that root node like main may only
// a name, but not linkage name.		// a name, but not linkage name.
StringRef RootName = PrevDIL->getScope()->getSubprogram()->getLinkageName();		StringRef RootName = PrevDIL->getScope()->getSubprogram()->getLinkageName();
if (RootName.empty())		if (RootName.empty())
RootName = PrevDIL->getScope()->getSubprogram()->getName();		RootName = PrevDIL->getScope()->getSubprogram()->getName();
S.push_back(std::make_pair(LineLocation(0, 0), RootName));		S.push_back(std::make_pair(LineLocation(0, 0), Function::getGUID(RootName)));

ContextTrieNode *ContextNode = &RootContext;		ContextTrieNode *ContextNode = &RootContext;
int I = S.size();		int I = S.size();
while (--I >= 0 && ContextNode) {		while (--I >= 0 && ContextNode) {
LineLocation &CallSite = S[I].first;		LineLocation &CallSite = S[I].first;
StringRef &CalleeName = S[I].second;		uint64_t &CalleeGUID = S[I].second;
ContextNode = ContextNode->getChildContext(CallSite, CalleeName);		ContextNode = ContextNode->getChildContext(CallSite, CalleeGUID);
}		}

if (I < 0)		if (I < 0)
return ContextNode;		return ContextNode;

return nullptr;		return nullptr;
}		}

ContextTrieNode *		ContextTrieNode *
SampleContextTracker::getOrCreateContextPath(const SampleContext &Context,		SampleContextTracker::getOrCreateContextPath(const SampleContext &Context,
bool AllowCreate) {		bool AllowCreate) {
ContextTrieNode *ContextNode = &RootContext;		ContextTrieNode *ContextNode = &RootContext;
StringRef ContextRemain = Context;		StringRef ContextRemain = Context;
StringRef ChildContext;		StringRef ChildContext;
StringRef CalleeName;		StringRef CalleeName;
LineLocation CallSiteLoc(0, 0);		LineLocation CallSiteLoc(0, 0);

while (ContextNode && !ContextRemain.empty()) {		while (ContextNode && !ContextRemain.empty()) {
auto ContextSplit = SampleContext::splitContextString(ContextRemain);		auto ContextSplit = SampleContext::splitContextString(ContextRemain);
ChildContext = ContextSplit.first;		ChildContext = ContextSplit.first;
ContextRemain = ContextSplit.second;		ContextRemain = ContextSplit.second;
LineLocation NextCallSiteLoc(0, 0);		LineLocation NextCallSiteLoc(0, 0);
SampleContext::decodeContextString(ChildContext, CalleeName,		SampleContext::decodeContextString(ChildContext, CalleeName,
NextCallSiteLoc);		NextCallSiteLoc);
		auto CalleeGUID = FunctionSamples::getGUID(CalleeName);

// Create child node at parent line/disc location		// Create child node at parent line/disc location
if (AllowCreate) {		if (AllowCreate) {
ContextNode =		ContextNode =
ContextNode->getOrCreateChildContext(CallSiteLoc, CalleeName);		ContextNode->getOrCreateChildContext(CallSiteLoc, CalleeGUID);
} else {		} else {
ContextNode = ContextNode->getChildContext(CallSiteLoc, CalleeName);		ContextNode = ContextNode->getChildContext(CallSiteLoc, CalleeGUID);
}		}
CallSiteLoc = NextCallSiteLoc;		CallSiteLoc = NextCallSiteLoc;
}		}

assert((!AllowCreate \|\| ContextNode) &&		assert((!AllowCreate \|\| ContextNode) &&
"Node must exist if creation is allowed");		"Node must exist if creation is allowed");
return ContextNode;		return ContextNode;
}		}

ContextTrieNode *SampleContextTracker::getTopLevelContextNode(StringRef FName) {		ContextTrieNode *SampleContextTracker::getTopLevelContextNode(uint64_t GUID) {
assert(!FName.empty() && "Top level node query must provide valid name");		assert(GUID && "Top level node query must provide valid name");
return RootContext.getChildContext(LineLocation(0, 0), FName);		return RootContext.getChildContext(LineLocation(0, 0), GUID);
}

ContextTrieNode &SampleContextTracker::addTopLevelContextNode(StringRef FName) {
assert(!getTopLevelContextNode(FName) && "Node to add must not exist");
return *RootContext.getOrCreateChildContext(LineLocation(0, 0), FName);
}		}

void SampleContextTracker::mergeContextNode(ContextTrieNode &FromNode,		void SampleContextTracker::mergeContextNode(ContextTrieNode &FromNode,
ContextTrieNode &ToNode,		ContextTrieNode &ToNode,
StringRef ContextStrToRemove) {		StringRef ContextStrToRemove) {
FunctionSamples *FromSamples = FromNode.getFunctionSamples();		FunctionSamples *FromSamples = FromNode.getFunctionSamples();
FunctionSamples *ToSamples = ToNode.getFunctionSamples();		FunctionSamples *ToSamples = ToNode.getFunctionSamples();
if (FromSamples && ToSamples) {		if (FromSamples && ToSamples) {
Show All 21 Lines	ContextTrieNode &SampleContextTracker::promoteMergeContextSamplesTree(
ContextTrieNode &FromNodeParent = *FromNode.getParentContext();		ContextTrieNode &FromNodeParent = *FromNode.getParentContext();
ContextTrieNode *ToNode = nullptr;		ContextTrieNode *ToNode = nullptr;
bool MoveToRoot = (&ToNodeParent == &RootContext);		bool MoveToRoot = (&ToNodeParent == &RootContext);
if (!MoveToRoot) {		if (!MoveToRoot) {
NewCallSiteLoc = OldCallSiteLoc;		NewCallSiteLoc = OldCallSiteLoc;
}		}

// Locate destination node, create/move if not existing		// Locate destination node, create/move if not existing
ToNode = ToNodeParent.getChildContext(NewCallSiteLoc, FromNode.getFuncName());		ToNode = ToNodeParent.getChildContext(NewCallSiteLoc, FromNode.getFuncGUID());
if (!ToNode) {		if (!ToNode) {
// Do not delete node to move from its parent here because		// Do not delete node to move from its parent here because
// caller is iterating over children of that parent node.		// caller is iterating over children of that parent node.
ToNode = &ToNodeParent.moveToChildContext(		ToNode = &ToNodeParent.moveToChildContext(
NewCallSiteLoc, std::move(FromNode), ContextStrToRemove, false);		NewCallSiteLoc, std::move(FromNode), ContextStrToRemove, *this, false);
} else {		} else {
// Destination node exists, merge samples for the context tree		// Destination node exists, merge samples for the context tree
mergeContextNode(FromNode, *ToNode, ContextStrToRemove);		mergeContextNode(FromNode, *ToNode, ContextStrToRemove);
LLVM_DEBUG({		LLVM_DEBUG({
if (ToNode->getFunctionSamples())		if (ToNode->getFunctionSamples())
dbgs() << " Context promoted and merged to: "		dbgs() << " Context promoted and merged to: "
<< ToNode->getFunctionSamples()->getContext() << "\n";		<< getContextInRealName(
		ToNode->getFunctionSamples()->getContext())
		<< "\n";
});		});

// Recursively promote and merge children		// Recursively promote and merge children
for (auto &It : FromNode.getAllChildContext()) {		for (auto &It : FromNode.getAllChildContext()) {
ContextTrieNode &FromChildNode = It.second;		ContextTrieNode &FromChildNode = It.second;
promoteMergeContextSamplesTree(FromChildNode, *ToNode,		promoteMergeContextSamplesTree(FromChildNode, *ToNode,
ContextStrToRemove);		ContextStrToRemove);
}		}

// Remove children once they're all merged		// Remove children once they're all merged
FromNode.getAllChildContext().clear();		FromNode.getAllChildContext().clear();
}		}

// For root of subtree, remove itself from old parent too		// For root of subtree, remove itself from old parent too
if (MoveToRoot)		if (MoveToRoot)
FromNodeParent.removeChildContext(OldCallSiteLoc, ToNode->getFuncName());		FromNodeParent.removeChildContext(OldCallSiteLoc, ToNode->getFuncGUID());

return *ToNode;		return *ToNode;
}		}
} // namespace llvm		} // namespace llvm

llvm/lib/Transforms/IPO/SampleProfile.cpp

Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines	using BlockEdgeMap =
DenseMap<const BasicBlock , SmallVector<const BasicBlock , 8>>;		DenseMap<const BasicBlock , SmallVector<const BasicBlock , 8>>;

class GUIDToFuncNameMapper {		class GUIDToFuncNameMapper {
public:		public:
GUIDToFuncNameMapper(Module &M, SampleProfileReader &Reader,		GUIDToFuncNameMapper(Module &M, SampleProfileReader &Reader,
DenseMap<uint64_t, StringRef> &GUIDToFuncNameMap)		DenseMap<uint64_t, StringRef> &GUIDToFuncNameMap)
: CurrentReader(Reader), CurrentModule(M),		: CurrentReader(Reader), CurrentModule(M),
CurrentGUIDToFuncNameMap(GUIDToFuncNameMap) {		CurrentGUIDToFuncNameMap(GUIDToFuncNameMap) {
if (!CurrentReader.useMD5())		if (!CurrentReader.useMD5() && !CurrentReader.profileIsCS())
return;		return;

for (const auto &F : CurrentModule) {		for (const auto &F : CurrentModule) {
StringRef OrigName = F.getName();		StringRef OrigName = F.getName();
CurrentGUIDToFuncNameMap.insert(		CurrentGUIDToFuncNameMap.insert(
{Function::getGUID(OrigName), OrigName});		{Function::getGUID(OrigName), OrigName});

// Local to global var promotion used by optimization like thinlto		// Local to global var promotion used by optimization like thinlto
▲ Show 20 Lines • Show All 368 Lines • ▼ Show 20 Lines	SampleProfileLoader::findCalleeFunctionSamples(const CallBase &Inst) const {
if (!DIL) {		if (!DIL) {
return nullptr;		return nullptr;
}		}

StringRef CalleeName;		StringRef CalleeName;
if (Function *Callee = Inst.getCalledFunction())		if (Function *Callee = Inst.getCalledFunction())
CalleeName = Callee->getName();		CalleeName = Callee->getName();

if (ProfileIsCS)		if (ProfileIsCS) {
return ContextTracker->getCalleeContextSamplesFor(Inst, CalleeName);		uint64_t CalleeGUID = 0;
		if (!CalleeName.empty())
		CalleeGUID = Function::getGUID(CalleeName);
		return ContextTracker->getCalleeContextSamplesFor(Inst, CalleeGUID);
		}

const FunctionSamples *FS = findFunctionSamples(Inst);		const FunctionSamples *FS = findFunctionSamples(Inst);
if (FS == nullptr)		if (FS == nullptr)
return nullptr;		return nullptr;

return FS->findFunctionSamplesAt(FunctionSamples::getCallSiteIdentifier(DIL),		return FS->findFunctionSamplesAt(FunctionSamples::getCallSiteIdentifier(DIL),
CalleeName, Reader->getRemapper());		CalleeName, Reader->getRemapper());
}		}
▲ Show 20 Lines • Show All 333 Lines • ▼ Show 20 Lines	while (!CalleeList.empty()) {
// even if callee doesn't have a corresponding context profile.		// even if callee doesn't have a corresponding context profile.
if (!CalleeSample \|\| CalleeSample->getEntrySamples() < Threshold)		if (!CalleeSample \|\| CalleeSample->getEntrySamples() < Threshold)
continue;		continue;

StringRef Name = CalleeSample->getFuncName();		StringRef Name = CalleeSample->getFuncName();
Function *Func = SymbolMap.lookup(Name);		Function *Func = SymbolMap.lookup(Name);
// Add to the import list only when it's defined out of module.		// Add to the import list only when it's defined out of module.
if (!Func \|\| Func->isDeclaration())		if (!Func \|\| Func->isDeclaration())
InlinedGUIDs.insert(FunctionSamples::getGUID(Name));		InlinedGUIDs.insert(FunctionSamples::getGUID(CalleeSample->getName()));

// Import hot CallTargets, which may not be available in IR because full		// Import hot CallTargets, which may not be available in IR because full
// profile annotation cannot be done until backend compilation in ThinLTO.		// profile annotation cannot be done until backend compilation in ThinLTO.
for (const auto &BS : CalleeSample->getBodySamples())		for (const auto &BS : CalleeSample->getBodySamples())
for (const auto &TS : BS.second.getCallTargets())		for (const auto &TS : BS.second.getCallTargets())
if (TS.getValue() > Threshold) {		if (TS.getValue() > Threshold) {
StringRef CalleeName = CalleeSample->getFuncName(TS.getKey());		StringRef CalleeName = CalleeSample->getFuncName(TS.getKey());
const Function *Callee = SymbolMap.lookup(CalleeName);		const Function *Callee = SymbolMap.lookup(CalleeName);
if (!Callee \|\| Callee->isDeclaration())		if (!Callee \|\| Callee->isDeclaration())
InlinedGUIDs.insert(FunctionSamples::getGUID(CalleeName));		InlinedGUIDs.insert(FunctionSamples::getGUID(TS.getKey()));
}		}

// Import hot child context profile associted with callees. Note that this		// Import hot child context profile associted with callees. Note that this
// may have some overlap with the call target loop above, but doing this		// may have some overlap with the call target loop above, but doing this
// based child context profile again effectively allow us to use the max of		// based child context profile again effectively allow us to use the max of
// entry count and call target count to determine importing.		// entry count and call target count to determine importing.
for (auto &Child : Node->getAllChildContext()) {		for (auto &Child : Node->getAllChildContext()) {
ContextTrieNode *CalleeNode = &Child.second;		ContextTrieNode *CalleeNode = &Child.second;
▲ Show 20 Lines • Show All 806 Lines • ▼ Show 20 Lines	if (Reader->profileIsCS()) {
if (!CallsitePrioritizedInline.getNumOccurrences())		if (!CallsitePrioritizedInline.getNumOccurrences())
CallsitePrioritizedInline = true;		CallsitePrioritizedInline = true;

// Enable iterative-BFI by default for CSSPGO.		// Enable iterative-BFI by default for CSSPGO.
if (!UseIterativeBFIInference.getNumOccurrences())		if (!UseIterativeBFIInference.getNumOccurrences())
UseIterativeBFIInference = true;		UseIterativeBFIInference = true;

// Tracker for profiles under different context		// Tracker for profiles under different context
ContextTracker =		ContextTracker = std::make_unique<SampleContextTracker>(
std::make_unique<SampleContextTracker>(Reader->getProfiles());		Reader->getProfiles(), GUIDToFuncNameMap);
}		}

// Load pseudo probe descriptors for probe-based function samples.		// Load pseudo probe descriptors for probe-based function samples.
if (Reader->profileIsProbeBased()) {		if (Reader->profileIsProbeBased()) {
ProbeManager = std::make_unique<PseudoProbeManager>(M);		ProbeManager = std::make_unique<PseudoProbeManager>(M);
if (!ProbeManager->moduleIsProbed(M)) {		if (!ProbeManager->moduleIsProbed(M)) {
const char *Msg =		const char *Msg =
"Pseudo-probe-based profile requires SampleProfileProbePass";		"Pseudo-probe-based profile requires SampleProfileProbePass";
▲ Show 20 Lines • Show All 185 Lines • Show Last 20 Lines

llvm/test/Transforms/SampleProfile/Inputs/csspgo-import-list.md5.prof

This file was added.

				[0xdb956436e78dd5fa:3 @ 0x3a529c5814aaf5e8]:23254:11
				0: 10
				1: 23250
				!Attributes: 0
				[0xdb956436e78dd5fa]:154:2
				2: 12
				3: 18 0x5790b5589256d455:11
				3.1: 18 0x5790b5589256d455:19
				!Attributes: 0
				[0xdb956436e78dd5fa:3.1 @ 0x630ba95aaba8cb5]:120:7040
				0: 7001
				1: 19 0x62919f2827854931:9999
				3: 12
				!Attributes: 0
				[0xdb956436e78dd5fa:2 @ 0x5790b5589256d455:2 @ 0xe4024261ef60ad54]:120:101
				0: 99
				1: 6
				3: 97
				!Attributes: 0
				[0xdb956436e78dd5fa:2 @ 0x5790b5589256d455]:99:11
				0: 10
				1: 10 0x62919f2827854931:11
				2: 287864 0xe4024261ef60ad54:315608
				3: 24
				!Attributes: 0
				[0xdb956436e78dd5fa:3 @ 0x8848a048c0e66db8]:23:45201
				0: 10
				1: 23250
				!Attributes: 0
				[0xdb956436e78dd5fa:3.1 @ 0x630ba95aaba8cb5 @ 0x34f4c893b42f679a]:1:9010
				0: 7001
				1: 19 0xd847dc5c708801f8:9999
				3: 12
				!Attributes: 0
				No newline at end of file

llvm/test/Transforms/SampleProfile/Inputs/indirect-call-csspgo-md5.prof

This file was added.

				[0x73d32146cd6b8f09]:63067:0
				1: 3345 0x7f8d88fcc70a347b:2059 0xf129122801e64264:1398
				2: 100 0xd29e2fe34de9ae40:102
				3: 100 0xdd875e8eb83dc5d6:102
				!Attributes: 0
				[0x73d32146cd6b8f09:1 @ 0x7f8d88fcc70a347b]:4220:1200
				14: 4220
				!Attributes: 0
				[0x73d32146cd6b8f09:2 @ 0xd29e2fe34de9ae40]:200:100
				5: 100
				!Attributes: 0
				[0x73d32146cd6b8f09:1 @ 0xf129122801e64264]:200:100
				1: 100
				!Attributes: 0
				No newline at end of file

llvm/test/Transforms/SampleProfile/Inputs/profile-context-tracker-md5.prof

This file was added.

				[0xdb956436e78dd5fa:3 @ 0x5790b5589256d455:1 @ 0x62919f2827854931]:1467299:11
				0: 6
				1: 6
				3: 287884
				4: 287864 0xe4024261ef60ad54:315608
				15: 23
				!Attributes: 0
				[0xdb956436e78dd5fa:3.1 @ 0x630ba95aaba8cb5:1 @ 0x62919f2827854931]:500853:20
				0: 15
				1: 15
				3: 74946
				4: 74941 0xe4024261ef60ad54:82359
				10: 23324
				11: 23327 0xe4024261ef60ad54:25228
				15: 11
				!Attributes: 0
				[0xdb956436e78dd5fa]:154:0
				2: 12
				3: 18 0x5790b5589256d455:11
				3.1: 18 0x630ba95aaba8cb5:19
				!Attributes: 0
				[0x4881065a99b6216a:12 @ 0xdb956436e78dd5fa]:154:12
				2: 12
				3: 10 0x5790b5589256d455:7
				3.1: 10 0x630ba95aaba8cb5:11
				!Attributes: 0
				[0xdb956436e78dd5fa:3.1 @ 0x630ba95aaba8cb5]:120:19
				0: 19
				1: 19 0x62919f2827854931:20
				3: 12
				!Attributes: 0
				[0x4881065a99b6216a:10 @ 0x630ba95aaba8cb5]:120:10
				0: 10
				1: 10
				!Attributes: 0
				[0x2229d555d2aa470d:17 @ 0x630ba95aaba8cb5]:120:3
				0: 3
				1: 3
				!Attributes: 0
				[0xdb956436e78dd5fa:3 @ 0x5790b5589256d455]:99:11
				0: 10
				1: 10 0x62919f2827854931:11
				3: 24
				!Attributes: 0
				No newline at end of file

llvm/test/Transforms/SampleProfile/csspgo-import-list.ll

	; Make sure Import GUID list for ThinLTO properly set for CSSPGO			; Make sure Import GUID list for ThinLTO properly set for CSSPGO
	; RUN: opt < %s -passes='thinlto-pre-link<O2>' -pgo-kind=pgo-sample-use-pipeline -sample-profile-file=%S/Inputs/csspgo-import-list.prof -S \| FileCheck %s			; RUN: opt < %s -passes='thinlto-pre-link<O2>' -pgo-kind=pgo-sample-use-pipeline -sample-profile-file=%S/Inputs/csspgo-import-list.prof -S \| FileCheck %s
	; RUN: opt < %s -passes='thinlto-pre-link<O2>' -pgo-kind=pgo-sample-use-pipeline -sample-profile-file=%S/Inputs/csspgo-import-list.prof.extbin -S \| FileCheck %s			; RUN: opt < %s -passes='thinlto-pre-link<O2>' -pgo-kind=pgo-sample-use-pipeline -sample-profile-file=%S/Inputs/csspgo-import-list.prof.extbin -S \| FileCheck %s
				; RUN: opt < %s -passes='thinlto-pre-link<O2>' -pgo-kind=pgo-sample-use-pipeline -sample-profile-file=%S/Inputs/csspgo-import-list.md5.prof -S \| FileCheck %s

	declare i32 @_Z5funcBi(i32 %x)			declare i32 @_Z5funcBi(i32 %x)
	declare i32 @_Z5funcAi(i32 %x)			declare i32 @_Z5funcAi(i32 %x)

	define dso_local i32 @main() local_unnamed_addr #0 !dbg !18 {			define dso_local i32 @main() local_unnamed_addr #0 !dbg !18 {
	entry:			entry:
	br label %for.body, !dbg !25			br label %for.body, !dbg !25

	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/test/Transforms/SampleProfile/csspgo-inline-icall.ll

	; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/indirect-call-csspgo.prof -sample-profile-icp-relative-hotness=1 -pass-remarks=sample-profile -S -o /dev/null 2>&1 \| FileCheck -check-prefix=ICP-ALL %s			; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/indirect-call-csspgo.prof -sample-profile-icp-relative-hotness=1 -pass-remarks=sample-profile -S -o /dev/null 2>&1 \| FileCheck -check-prefix=ICP-ALL %s
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/indirect-call-csspgo.prof -sample-profile-icp-relative-hotness=1 -pass-remarks=sample-profile -S -o /dev/null 2>&1 \| FileCheck -check-prefix=ICP-ALL %s			; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/indirect-call-csspgo.prof -sample-profile-icp-relative-hotness=1 -pass-remarks=sample-profile -S -o /dev/null 2>&1 \| FileCheck -check-prefix=ICP-ALL %s
	; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/indirect-call-csspgo.prof -sample-profile-icp-relative-hotness=1 -pass-remarks=sample-profile -sample-profile-inline-size=0 -S -o /dev/null 2>&1 \| FileCheck -check-prefix=ICP-HOT %s			; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/indirect-call-csspgo.prof -sample-profile-icp-relative-hotness=1 -pass-remarks=sample-profile -sample-profile-inline-size=0 -S -o /dev/null 2>&1 \| FileCheck -check-prefix=ICP-HOT %s
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/indirect-call-csspgo.prof -sample-profile-icp-relative-hotness=1 -pass-remarks=sample-profile -sample-profile-inline-size=0 -S -o /dev/null 2>&1 \| FileCheck -check-prefix=ICP-HOT %s			; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/indirect-call-csspgo.prof -sample-profile-icp-relative-hotness=1 -pass-remarks=sample-profile -sample-profile-inline-size=0 -S -o /dev/null 2>&1 \| FileCheck -check-prefix=ICP-HOT %s
				; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/indirect-call-csspgo-md5.prof -sample-profile-icp-relative-hotness=1 -pass-remarks=sample-profile -sample-profile-inline-size=0 -S -o /dev/null 2>&1 \| FileCheck -check-prefix=ICP-HOT %s

	define void @test(void ()*) #0 !dbg !3 {			define void @test(void ()*) #0 !dbg !3 {
	;; Add two direct call to force top-down order for sample profile loader			;; Add two direct call to force top-down order for sample profile loader
	call void @_Z3foov(), !dbg !7			call void @_Z3foov(), !dbg !7
	call void @_Z3barv(), !dbg !7			call void @_Z3barv(), !dbg !7
	call void @_Z3bazv(), !dbg !7			call void @_Z3bazv(), !dbg !7
	%2 = alloca void ()*			%2 = alloca void ()*
	store void ()* %0, void ()** %2			store void ()* %0, void ()** %2
	▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/test/Transforms/SampleProfile/csspgo-inline.ll

	; Test for CSSPGO's new early inliner using priority queue			; Test for CSSPGO's new early inliner using priority queue

	; Note that we need new pass manager to enable top-down processing for sample profile loader			; Note that we need new pass manager to enable top-down processing for sample profile loader
	; Test we inlined the following in top-down order with old inliner			; Test we inlined the following in top-down order with old inliner
	; main:3 @ _Z5funcAi			; main:3 @ _Z5funcAi
	; main:3 @ _Z5funcAi:1 @ _Z8funcLeafi			; main:3 @ _Z5funcAi:1 @ _Z8funcLeafi
	; _Z5funcBi:1 @ _Z8funcLeafi			; _Z5funcBi:1 @ _Z8funcLeafi
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -sample-profile-prioritized-inline=0 -profile-sample-accurate -S -pass-remarks=inline -o /dev/null 2>&1 \| FileCheck %s --check-prefix=INLINE-BASE			; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -sample-profile-prioritized-inline=0 -profile-sample-accurate -S -pass-remarks=inline -o /dev/null 2>&1 \| FileCheck %s --check-prefix=INLINE-BASE
				; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker-md5.prof -sample-profile-inline-size -sample-profile-prioritized-inline=0 -profile-sample-accurate -S -pass-remarks=inline -o /dev/null 2>&1 \| FileCheck %s --check-prefix=INLINE-BASE
	;			;
	; With new FDO early inliner, callee entry count is used to drive inlining instead of callee total samples, so we get less inlining for given profile			; With new FDO early inliner, callee entry count is used to drive inlining instead of callee total samples, so we get less inlining for given profile
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -profile-sample-accurate -S -pass-remarks=inline -o /dev/null 2>&1 \| FileCheck %s --check-prefix=INLINE-NEW			; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -profile-sample-accurate -S -pass-remarks=inline -o /dev/null 2>&1 \| FileCheck %s --check-prefix=INLINE-NEW
	;			;
	; With new FDO early inliner, callee entry count is used to drive inlining instead of callee total samples, tuning hot cutoff can get us the same inlining			; With new FDO early inliner, callee entry count is used to drive inlining instead of callee total samples, tuning hot cutoff can get us the same inlining
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -profile-summary-cutoff-hot=999900 -profile-sample-accurate -S -pass-remarks=inline -o /dev/null 2>&1 \| FileCheck %s --check-prefix=INLINE-BASE			; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -profile-summary-cutoff-hot=999900 -profile-sample-accurate -S -pass-remarks=inline -o /dev/null 2>&1 \| FileCheck %s --check-prefix=INLINE-BASE
	;			;
	; With new FDO early inliner, callee entry count is used to drive inlining instead of callee total samples, tuning cold sample profile inline threshold can get us the same inlining			; With new FDO early inliner, callee entry count is used to drive inlining instead of callee total samples, tuning cold sample profile inline threshold can get us the same inlining
	▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines

llvm/test/tools/llvm-profdata/cs-sample-profile.test

	RUN: llvm-profdata merge --sample --text -output=%t.proftext %S/Inputs/cs-sample.proftext			RUN: llvm-profdata merge --sample --text -output=%t.proftext %S/Inputs/cs-sample.proftext
	RUN: diff -b %t.proftext %S/Inputs/cs-sample.proftext			RUN: diff -b %t.proftext %S/Inputs/cs-sample.proftext
	RUN: llvm-profdata merge --sample --extbinary %p/Inputs/cs-sample.proftext -o %t.prof && llvm-profdata merge --sample --text %t.prof -o %t1.proftext			RUN: llvm-profdata merge --sample --extbinary %p/Inputs/cs-sample.proftext -o %t.prof && llvm-profdata merge --sample --text %t.prof -o %t1.proftext
	RUN: diff -b %t1.proftext %S/Inputs/cs-sample.proftext			RUN: diff -b %t1.proftext %S/Inputs/cs-sample.proftext
				RUN: llvm-profdata merge --sample --text --use-md5 -output=%t1.md5.proftext %t.proftext
				RUN: llvm-profdata merge --sample --text --use-md5 -output=%t2.md5.proftext %t.prof
				RUN: llvm-profdata merge --sample --extbinary --use-md5 -output=%t3.md5.prof %t.prof
				RUN: llvm-profdata merge --sample --text --use-md5 -output=%t4.md5.proftext %t3.md5.prof
				RUN: cat %t1.md5.proftext \| FileCheck %s
				RUN: cat %t2.md5.proftext \| FileCheck %s
				RUN: cat %t4.md5.proftext \| FileCheck %s


				CHECK-DAG: [0xdb956436e78dd5fa:3 @ 0x5790b5589256d455:1 @ 0x62919f2827854931]:1467299:11
				CHECK-DAG-NEXT: 0: 6
				CHECK-DAG-NEXT: 1: 6
				CHECK-DAG-NEXT: 3: 287884
				CHECK-DAG-NEXT: 4: 287864 0xe4024261ef60ad54:315608
				CHECK-DAG-NEXT: 15: 23
				CHECK-DAG: [0xdb956436e78dd5fa:3.1 @ 0x630ba95aaba8cb5:1 @ 0x62919f2827854931]:500853:20
				CHECK-DAG-NEXT: 0: 15
				CHECK-DAG-NEXT: 1: 15
				CHECK-DAG-NEXT: 3: 74946
				CHECK-DAG-NEXT: 4: 74941 0xe4024261ef60ad54:82359
				CHECK-DAG-NEXT: 10: 23324
				CHECK-DAG-NEXT: 11: 23327 0xe4024261ef60ad54:25228
				CHECK-DAG-NEXT: 15: 11
				CHECK-DAG: [0xdb956436e78dd5fa]:154:0
				CHECK-DAG-NEXT: 2: 12
				CHECK-DAG-NEXT: 3: 18 0x5790b5589256d455:11
				CHECK-DAG-NEXT: 3.1: 18 0x630ba95aaba8cb5:19
				CHECK-DAG: [0x4881065a99b6216a:12 @ 0xdb956436e78dd5fa]:154:12
				CHECK-DAG-NEXT: 2: 12
				CHECK-DAG-NEXT: 3: 10 0x5790b5589256d455:7
				CHECK-DAG-NEXT: 3.1: 10 0x630ba95aaba8cb5:11
				CHECK-DAG: [0xdb956436e78dd5fa:3.1 @ 0x630ba95aaba8cb5]:120:19
				CHECK-DAG-NEXT: 0: 19
				CHECK-DAG-NEXT: 1: 19 0x62919f2827854931:20
				CHECK-DAG-NEXT: 3: 12
				CHECK-DAG: [0x2229d555d2aa470d:17 @ 0x630ba95aaba8cb5]:120:3
				CHECK-DAG-NEXT: 0: 3
				CHECK-DAG-NEXT: 1: 3
				CHECK-DAG: [0x4881065a99b6216a:10 @ 0x630ba95aaba8cb5]:120:10
				CHECK-DAG-NEXT: 0: 10
				CHECK-DAG-NEXT: 1: 10
				CHECK-DAG: [0xdb956436e78dd5fa:3 @ 0x5790b5589256d455]:99:11
				CHECK-DAG-NEXT: 0: 10
				CHECK-DAG-NEXT: 1: 10 0x62919f2827854931:11
				CHECK-DAG-NEXT: 3: 24

llvm/test/tools/llvm-profgen/noinline-cs-noprobe.test

	; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/noinline-cs-noprobe.perfscript --binary=%S/Inputs/noinline-cs-noprobe.perfbin --output=%t --show-unwinder-output --profile-summary-cold-count=0 \| FileCheck %s --check-prefix=CHECK-UNWINDER			; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/noinline-cs-noprobe.perfscript --binary=%S/Inputs/noinline-cs-noprobe.perfbin --output=%t --show-unwinder-output --profile-summary-cold-count=0 \| FileCheck %s --check-prefix=CHECK-UNWINDER
	; RUN: FileCheck %s --input-file %t			; RUN: FileCheck %s --input-file %t
				; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/noinline-cs-noprobe.perfscript --binary=%S/Inputs/noinline-cs-noprobe.perfbin --output=%t2 --show-unwinder-output --profile-summary-cold-count=0 --use-md5
				; RUN: FileCheck %s --input-file %t2 --check-prefix=CHECK-MD5

	; CHECK:[main:1 @ foo]:54:0			; CHECK:[main:1 @ foo]:54:0
	; CHECK: 2: 3			; CHECK: 2: 3
	; CHECK: 3: 3 bar:3			; CHECK: 3: 3 bar:3
	; CHECK:[main:1 @ foo:3 @ bar]:50:3			; CHECK:[main:1 @ foo:3 @ bar]:50:3
	; CHECK: 0: 3			; CHECK: 0: 3
	; CHECK: 1: 3			; CHECK: 1: 3
	; CHECK: 2: 2			; CHECK: 2: 2
	; CHECK: 4: 1			; CHECK: 4: 1
	; CHECK: 5: 3			; CHECK: 5: 3

				; CHECK-MD5:[0xdb956436e78dd5fa:1 @ 0x5cf8c24cdb18bdac]:54:0
				; CHECK-MD5: 2: 3
				; CHECK-MD5: 3: 3 0xe413754a191db537:3
				; CHECK-MD5:[0xdb956436e78dd5fa:1 @ 0x5cf8c24cdb18bdac:3 @ 0xe413754a191db537]:50:3
				; CHECK-MD5: 0: 3
				; CHECK-MD5: 1: 3
				; CHECK-MD5: 2: 2
				; CHECK-MD5: 4: 1
				; CHECK-MD5: 5: 3

	; CHECK-UNWINDER: Binary(noinline-cs-noprobe.perfbin)'s Range Counter:			; CHECK-UNWINDER: Binary(noinline-cs-noprobe.perfbin)'s Range Counter:
	; CHECK-UNWINDER: main:1 @ foo			; CHECK-UNWINDER: main:1 @ foo
	; CHECK-UNWINDER: (5ff, 62f): 3			; CHECK-UNWINDER: (5ff, 62f): 3
	; CHECK-UNWINDER: (634, 637): 3			; CHECK-UNWINDER: (634, 637): 3
	; CHECK-UNWINDER: (645, 645): 3			; CHECK-UNWINDER: (645, 645): 3
	; CHECK-UNWINDER: main:1 @ foo:3 @ bar			; CHECK-UNWINDER: main:1 @ foo:3 @ bar
	; CHECK-UNWINDER: (5b0, 5c8): 1			; CHECK-UNWINDER: (5b0, 5c8): 1
	; CHECK-UNWINDER: (5b0, 5d7): 2			; CHECK-UNWINDER: (5b0, 5d7): 2
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test

	; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/noinline-cs-pseudoprobe.perfscript --binary=%S/Inputs/noinline-cs-pseudoprobe.perfbin --output=%t --show-unwinder-output --profile-summary-cold-count=0 \| FileCheck %s --check-prefix=CHECK-UNWINDER			; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/noinline-cs-pseudoprobe.perfscript --binary=%S/Inputs/noinline-cs-pseudoprobe.perfbin --output=%t --show-unwinder-output --profile-summary-cold-count=0 \| FileCheck %s --check-prefix=CHECK-UNWINDER
	; RUN: FileCheck %s --input-file %t			; RUN: FileCheck %s --input-file %t
				; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/noinline-cs-pseudoprobe.perfscript --binary=%S/Inputs/noinline-cs-pseudoprobe.perfbin --output=%t2 --show-unwinder-output --profile-summary-cold-count=0 --use-md5
				; RUN: FileCheck %s --input-file %t2 --check-prefix=CHECK-MD5

	; CHECK: [main:2 @ foo]:75:0			; CHECK: [main:2 @ foo]:75:0
	; CHECK-NEXT: 1: 0			; CHECK-NEXT: 1: 0
	; CHECK-NEXT: 2: 15			; CHECK-NEXT: 2: 15
	; CHECK-NEXT: 3: 15			; CHECK-NEXT: 3: 15
	; CHECK-NEXT: 4: 15			; CHECK-NEXT: 4: 15
	; CHECK-NEXT: 5: 0			; CHECK-NEXT: 5: 0
	; CHECK-NEXT: 6: 15			; CHECK-NEXT: 6: 15
	; CHECK-NEXT: 7: 0			; CHECK-NEXT: 7: 0
	; CHECK-NEXT: 8: 15 bar:15			; CHECK-NEXT: 8: 15 bar:15
	; CHECK-NEXT: 9: 0			; CHECK-NEXT: 9: 0
	; CHECK-NEXT: !CFGChecksum: 563088904013236			; CHECK-NEXT: !CFGChecksum: 563088904013236
	; CHECK:[main:2 @ foo:8 @ bar]:30:15			; CHECK:[main:2 @ foo:8 @ bar]:30:15
	; CHECK-NEXT: 1: 15			; CHECK-NEXT: 1: 15
	; CHECK-NEXT: 4: 15			; CHECK-NEXT: 4: 15
	; CHECK-NEXT: !CFGChecksum: 72617220756			; CHECK-NEXT: !CFGChecksum: 72617220756


				; CHECK-MD5: [0xdb956436e78dd5fa:2 @ 0x5cf8c24cdb18bdac]:75:0
				; CHECK-MD5-NEXT: 1: 0
				; CHECK-MD5-NEXT: 2: 15
				; CHECK-MD5-NEXT: 3: 15
				; CHECK-MD5-NEXT: 4: 15
				; CHECK-MD5-NEXT: 5: 0
				; CHECK-MD5-NEXT: 6: 15
				; CHECK-MD5-NEXT: 7: 0
				; CHECK-MD5-NEXT: 8: 15 0xe413754a191db537:15
				; CHECK-MD5-NEXT: 9: 0
				; CHECK-MD5-NEXT: !CFGChecksum: 563088904013236
				; CHECK-MD5: [0xdb956436e78dd5fa:2 @ 0x5cf8c24cdb18bdac:8 @ 0xe413754a191db537]:30:15
				; CHECK-MD5-NEXT: 1: 15
				; CHECK-MD5-NEXT: 4: 15
				; CHECK-MD5-NEXT: !CFGChecksum: 72617220756


	; CHECK-UNWINDER: Binary(noinline-cs-pseudoprobe.perfbin)'s Range Counter:			; CHECK-UNWINDER: Binary(noinline-cs-pseudoprobe.perfbin)'s Range Counter:
	; CHECK-UNWINDER-NEXT: main:2			; CHECK-UNWINDER-NEXT: main:2
	; CHECK-UNWINDER-NEXT: (79e, 7bf): 15			; CHECK-UNWINDER-NEXT: (79e, 7bf): 15
	; CHECK-UNWINDER-NEXT: (7c4, 7cf): 15			; CHECK-UNWINDER-NEXT: (7c4, 7cf): 15
	; CHECK-UNWINDER-NEXT: main:2 @ foo:8			; CHECK-UNWINDER-NEXT: main:2 @ foo:8
	; CHECK-UNWINDER-NEXT: (760, 77f): 15			; CHECK-UNWINDER-NEXT: (760, 77f): 15

	; CHECK-UNWINDER: Binary(noinline-cs-pseudoprobe.perfbin)'s Branch Counter:			; CHECK-UNWINDER: Binary(noinline-cs-pseudoprobe.perfbin)'s Branch Counter:
	Show All 31 Lines

llvm/tools/llvm-profdata/llvm-profdata.cpp

	Show First 20 Lines • Show All 649 Lines • ▼ Show 20 Lines
	}			}

	static void handleExtBinaryWriter(sampleprof::SampleProfileWriter &Writer,			static void handleExtBinaryWriter(sampleprof::SampleProfileWriter &Writer,
	ProfileFormat OutputFormat,			ProfileFormat OutputFormat,
	MemoryBuffer *Buffer,			MemoryBuffer *Buffer,
	sampleprof::ProfileSymbolList &WriterList,			sampleprof::ProfileSymbolList &WriterList,
	bool CompressAllSections, bool UseMD5,			bool CompressAllSections, bool UseMD5,
	bool GenPartialProfile) {			bool GenPartialProfile) {
				using namespace sampleprof;
	populateProfileSymbolList(Buffer, WriterList);			populateProfileSymbolList(Buffer, WriterList);
	if (WriterList.size() > 0 && OutputFormat != PF_Ext_Binary)			if (WriterList.size() > 0 && OutputFormat != PF_Ext_Binary)
	warn("Profile Symbol list is not empty but the output format is not "			warn("Profile Symbol list is not empty but the output format is not "
	"ExtBinary format. The list will be lost in the output. ");			"ExtBinary format. The list will be lost in the output. ");

	Writer.setProfileSymbolList(&WriterList);			Writer.setProfileSymbolList(&WriterList);

	if (CompressAllSections) {			if (CompressAllSections) {
	if (OutputFormat != PF_Ext_Binary)			if (OutputFormat != PF_Ext_Binary)
	warn("-compress-all-section is ignored. Specify -extbinary to enable it");			warn("-compress-all-section is ignored. Specify -extbinary to enable it");
	else			else
	Writer.setToCompressAllSections();			Writer.setToCompressAllSections();
	}			}
	if (UseMD5) {			if (UseMD5) {
	if (OutputFormat != PF_Ext_Binary)			if (FunctionSamples::ProfileIsCS)
				FunctionSamples::UseMD5 = true;
				else if (OutputFormat != PF_Ext_Binary)
	warn("-use-md5 is ignored. Specify -extbinary to enable it");			warn("-use-md5 is ignored. Specify -extbinary to enable it");
	else			else
	Writer.setUseMD5();			Writer.setUseMD5();
	}			}
	if (GenPartialProfile) {			if (GenPartialProfile) {
	if (OutputFormat != PF_Ext_Binary)			if (OutputFormat != PF_Ext_Binary)
	warn("-gen-partial-profile is ignored. Specify -extbinary to enable it");			warn("-gen-partial-profile is ignored. Specify -extbinary to enable it");
	else			else
	▲ Show 20 Lines • Show All 1,900 Lines • Show Last 20 Lines

llvm/tools/llvm-profgen/CSPreInliner.h

	Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	private:			private:
	bool getInlineCandidates(ProfiledCandidateQueue &CQueue,			bool getInlineCandidates(ProfiledCandidateQueue &CQueue,
	const FunctionSamples *FCallerContextSamples);			const FunctionSamples *FCallerContextSamples);
	std::vector<StringRef> buildTopDownOrder();			std::vector<StringRef> buildTopDownOrder();
	void processFunction(StringRef Name);			void processFunction(StringRef Name);
	bool shouldInline(ProfiledInlineCandidate &Candidate);			bool shouldInline(ProfiledInlineCandidate &Candidate);
	SampleContextTracker ContextTracker;			SampleContextTracker ContextTracker;
	StringMap<FunctionSamples> &ProfileMap;			StringMap<FunctionSamples> &ProfileMap;
				DenseMap<uint64_t, StringRef> GUIDToFuncNameMap;

	// Count thresholds to answer isHotCount and isColdCount queries.			// Count thresholds to answer isHotCount and isColdCount queries.
	// Mirrors the threshold in ProfileSummaryInfo.			// Mirrors the threshold in ProfileSummaryInfo.
	uint64_t HotCountThreshold;			uint64_t HotCountThreshold;
	uint64_t ColdCountThreshold;			uint64_t ColdCountThreshold;
	};			};

	} // end namespace sampleprof			} // end namespace sampleprof
	} // end namespace llvm			} // end namespace llvm

	#endif			#endif

llvm/tools/llvm-profgen/CSPreInliner.cpp

Show All 26 Lines

static cl::opt<bool> SamplePreInlineReplay(		static cl::opt<bool> SamplePreInlineReplay(
"csspgo-replay-preinline", cl::Hidden, cl::init(false),		"csspgo-replay-preinline", cl::Hidden, cl::init(false),
cl::desc(		cl::desc(
"Replay previous inlining and adjust context profile accordingly"));		"Replay previous inlining and adjust context profile accordingly"));

CSPreInliner::CSPreInliner(StringMap<FunctionSamples> &Profiles,		CSPreInliner::CSPreInliner(StringMap<FunctionSamples> &Profiles,
uint64_t HotThreshold, uint64_t ColdThreshold)		uint64_t HotThreshold, uint64_t ColdThreshold)
: ContextTracker(Profiles), ProfileMap(Profiles),		: ContextTracker(Profiles, GUIDToFuncNameMap), ProfileMap(Profiles),
HotCountThreshold(HotThreshold), ColdCountThreshold(ColdThreshold) {}		HotCountThreshold(HotThreshold), ColdCountThreshold(ColdThreshold) {
		// Populate GUIDToFuncNameMap
		for (auto &FuncSample : Profiles) {
		SampleContext Context(FuncSample.first(), RawContext);
		StringRef ContextRemain = Context;
		StringRef ChildContext;
		StringRef CalleeName;
		while (!ContextRemain.empty()) {
		auto ContextSplit = SampleContext::splitContextString(ContextRemain);
		ChildContext = ContextSplit.first;
		ContextRemain = ContextSplit.second;
		LineLocation NextCallSiteLoc(0, 0);
		SampleContext::decodeContextString(ChildContext, CalleeName,
		NextCallSiteLoc);
		auto CalleeGUID = FunctionSamples::getGUID(CalleeName);
		GUIDToFuncNameMap[CalleeGUID] = CalleeName;
		}
		}
		}

std::vector<StringRef> CSPreInliner::buildTopDownOrder() {		std::vector<StringRef> CSPreInliner::buildTopDownOrder() {
std::vector<StringRef> Order;		std::vector<StringRef> Order;
ProfiledCallGraph ProfiledCG(ContextTracker);		ProfiledCallGraph ProfiledCG(ContextTracker);

// Now that we have a profiled call graph, construct top-down order		// Now that we have a profiled call graph, construct top-down order
// by building up SCC and reversing SCC order.		// by building up SCC and reversing SCC order.
scc_iterator<ProfiledCallGraph *> I = scc_begin(&ProfiledCG);		scc_iterator<ProfiledCallGraph *> I = scc_begin(&ProfiledCG);
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	bool CSPreInliner::shouldInline(ProfiledInlineCandidate &Candidate) {

return (Candidate.SizeCost < SampleThreshold);		return (Candidate.SizeCost < SampleThreshold);
}		}

void CSPreInliner::processFunction(const StringRef Name) {		void CSPreInliner::processFunction(const StringRef Name) {
LLVM_DEBUG(dbgs() << "Process " << Name		LLVM_DEBUG(dbgs() << "Process " << Name
<< " for context-sensitive pre-inlining\n");		<< " for context-sensitive pre-inlining\n");

FunctionSamples *FSamples = ContextTracker.getBaseSamplesFor(Name);		FunctionSamples *FSamples =
		ContextTracker.getBaseSamplesFor(FunctionSamples::getGUID(Name));
if (!FSamples)		if (!FSamples)
return;		return;

// Use the number of lines/probes as proxy for function size for now.		// Use the number of lines/probes as proxy for function size for now.
// TODO: retrieve accurate size from dwarf or binary instead.		// TODO: retrieve accurate size from dwarf or binary instead.
unsigned FuncSize = FSamples->getBodySamples().size();		unsigned FuncSize = FSamples->getBodySamples().size();
unsigned FuncFinalSize = FuncSize;		unsigned FuncFinalSize = FuncSize;
unsigned SizeLimit = FuncSize * ProfileInlineGrowthLimit;		unsigned SizeLimit = FuncSize * ProfileInlineGrowthLimit;
▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

llvm/tools/llvm-profgen/ProfileGenerator.cpp

Show All 19 Lines	static cl::opt<SampleProfileFormat> OutputFormat(
cl::values(		cl::values(
clEnumValN(SPF_Binary, "binary", "Binary encoding (default)"),		clEnumValN(SPF_Binary, "binary", "Binary encoding (default)"),
clEnumValN(SPF_Compact_Binary, "compbinary", "Compact binary encoding"),		clEnumValN(SPF_Compact_Binary, "compbinary", "Compact binary encoding"),
clEnumValN(SPF_Ext_Binary, "extbinary", "Extensible binary encoding"),		clEnumValN(SPF_Ext_Binary, "extbinary", "Extensible binary encoding"),
clEnumValN(SPF_Text, "text", "Text encoding"),		clEnumValN(SPF_Text, "text", "Text encoding"),
clEnumValN(SPF_GCC, "gcc",		clEnumValN(SPF_GCC, "gcc",
"GCC encoding (only meaningful for -sample)")));		"GCC encoding (only meaningful for -sample)")));

		cl::opt<bool> UseMD5(
		"use-md5", cl::Hidden, cl::init(false),
		cl::desc("Use md5 to represent function names in the output profile"));

static cl::opt<int32_t, true> RecursionCompression(		static cl::opt<int32_t, true> RecursionCompression(
"compress-recursion",		"compress-recursion",
cl::desc("Compressing recursion by deduplicating adjacent frame "		cl::desc("Compressing recursion by deduplicating adjacent frame "
"sequences up to the specified size. -1 means no size limit."),		"sequences up to the specified size. -1 means no size limit."),
cl::Hidden,		cl::Hidden,
cl::location(llvm::sampleprof::CSProfileGenerator::MaxCompressionSize));		cl::location(llvm::sampleprof::CSProfileGenerator::MaxCompressionSize));

static cl::opt<bool> CSProfMergeColdContext(		static cl::opt<bool> CSProfMergeColdContext(
▲ Show 20 Lines • Show All 179 Lines • ▼ Show 20 Lines	if (Ret.second) {
FProfile.setContext(FContext);		FProfile.setContext(FContext);
FProfile.setName(FContext.getNameWithoutContext());		FProfile.setName(FContext.getNameWithoutContext());
}		}
return Ret.first->second;		return Ret.first->second;
}		}

void CSProfileGenerator::generateProfile() {		void CSProfileGenerator::generateProfile() {
FunctionSamples::ProfileIsCS = true;		FunctionSamples::ProfileIsCS = true;
		FunctionSamples::UseMD5 = UseMD5;
for (const auto &BI : BinarySampleCounters) {		for (const auto &BI : BinarySampleCounters) {
ProfiledBinary *Binary = BI.first;		ProfiledBinary *Binary = BI.first;
for (const auto &CI : BI.second) {		for (const auto &CI : BI.second) {
const StringBasedCtxKey *CtxKey =		const StringBasedCtxKey *CtxKey =
dyn_cast<StringBasedCtxKey>(CI.first.getPtr());		dyn_cast<StringBasedCtxKey>(CI.first.getPtr());
StringRef ContextId(CtxKey->Context);		StringRef ContextId(CtxKey->Context);
// Get or create function profile for the range		// Get or create function profile for the range
FunctionSamples &FunctionProfile =		FunctionSamples &FunctionProfile =
▲ Show 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	for (const auto *P : Probes) {
Binary->getInlineContextForProbe(P, ContextStrStack, true);		Binary->getInlineContextForProbe(P, ContextStrStack, true);
}		}
}		}

void PseudoProbeCSProfileGenerator::generateProfile() {		void PseudoProbeCSProfileGenerator::generateProfile() {
// Enable pseudo probe functionalities in SampleProf		// Enable pseudo probe functionalities in SampleProf
FunctionSamples::ProfileIsProbeBased = true;		FunctionSamples::ProfileIsProbeBased = true;
FunctionSamples::ProfileIsCS = true;		FunctionSamples::ProfileIsCS = true;
		FunctionSamples::UseMD5 = UseMD5;
for (const auto &BI : BinarySampleCounters) {		for (const auto &BI : BinarySampleCounters) {
ProfiledBinary *Binary = BI.first;		ProfiledBinary *Binary = BI.first;
for (const auto &CI : BI.second) {		for (const auto &CI : BI.second) {
const ProbeBasedCtxKey *CtxKey =		const ProbeBasedCtxKey *CtxKey =
dyn_cast<ProbeBasedCtxKey>(CI.first.getPtr());		dyn_cast<ProbeBasedCtxKey>(CI.first.getPtr());
SmallVector<std::string, 16> ContextStrStack;		SmallVector<std::string, 16> ContextStrStack;
extractPrefixContextStack(ContextStrStack, CtxKey->Probes, Binary);		extractPrefixContextStack(ContextStrStack, CtxKey->Probes, Binary);
// Fill in function body samples from probes, also infer caller's samples		// Fill in function body samples from probes, also infer caller's samples
▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Introduce MD5-based context-sensitive profileAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 363113

llvm/docs/CommandGuide/llvm-profdata.rst

llvm/include/llvm/ProfileData/SampleProf.h

llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h

llvm/lib/ProfileData/SampleProf.cpp

llvm/lib/ProfileData/SampleProfReader.cpp

llvm/lib/ProfileData/SampleProfWriter.cpp

llvm/lib/Transforms/IPO/SampleContextTracker.cpp

llvm/lib/Transforms/IPO/SampleProfile.cpp

llvm/test/Transforms/SampleProfile/Inputs/csspgo-import-list.md5.prof

llvm/test/Transforms/SampleProfile/Inputs/indirect-call-csspgo-md5.prof

llvm/test/Transforms/SampleProfile/Inputs/profile-context-tracker-md5.prof

llvm/test/Transforms/SampleProfile/csspgo-import-list.ll

llvm/test/Transforms/SampleProfile/csspgo-inline-icall.ll

llvm/test/Transforms/SampleProfile/csspgo-inline.ll

llvm/test/tools/llvm-profdata/cs-sample-profile.test

llvm/test/tools/llvm-profgen/noinline-cs-noprobe.test

llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test

llvm/tools/llvm-profdata/llvm-profdata.cpp

llvm/tools/llvm-profgen/CSPreInliner.h

llvm/tools/llvm-profgen/CSPreInliner.cpp

llvm/tools/llvm-profgen/ProfileGenerator.cpp

[CSSPGO] Introduce MD5-based context-sensitive profile
AbandonedPublic