This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Introduce MD5-based context-sensitive profile
AbandonedPublic

Authored by hoy on Jul 30 2021, 9:04 AM.

Details

Summary

String-based CS profiles can have severe size inflation for C++ programs with very long function names. We have seen in some extreme cases, the name table sections took 99% of profile size for an extbinary profile, which in turn caused compiler to OOM or slow down. To address that issue we are enabling MD5-based CS profile.

Different from a MD5 non-CS profile where MD5 codes are stored as integers in the name table section and can be used with extbinary only, a MD5 CS profile keeps the profile context in string form with MD5 code used to represent functions in the context. Therefore it can be used with both text and extbinary profiles.

Here is an example of a name-based CS text profile and the MD5 counterpart:

[main:3.1 @ _Z5funcBi]:120:19
 0: 19
 1: 19 _Z8funcLeafi:20
 3: 12

[0xdb956436e78dd5fa:3.1 @ 0x630ba95aaba8cb5]:120:19
 0: 19
 1: 19 0x62919f2827854931:20
 3: 12

Note that in the MD5 profile all function names are replaced by their MD5 codes.

The extbinary equivalents work similarly by having the context (either in real names or MD5 codes) stored in the name table section. The main benefit of this is to avoid reconstructing the context string in the sample loader. An icing on the cake is to allow mixed use of real names and MD5 codes. There is a need of this when we start squeezing the size of pseudo probe descs.

Implementation
To support this string-based MD5 profile, we reuse part of the jobs done to the non-CS profile while diverge from the rest. The profile producer, i.e, llvm-profgen and lvm-profdata, will need a special flag --use-md5 to generate MD5 profile. Therefore the internal flag FunctionSamples::UseMD5 needs to be set. However, the profile consumer, i.e, the sample profile loader, is set up to know automatically if a function name is a real name or a MD5 code, based on the 0x prefix, and it does not need FunctionSamples::UseMD5 .

The sample context tracker is tweaked to operate on integral MD5 codes internally to support contexts with real names and MD5 codes. A GUIDToFuncNameMap, which is always built for CS profile, can be used to look up real names for debugging.

Testing
With the current change, the compiler generates exactly same code with MD5 and non-MD5 CS profile. Tested with SPEC and an internal large service. For the large service, extbinary profile size was down by 10x, build time reduced to half.

Diff Detail

Event Timeline

hoy created this revision.Jul 30 2021, 9:04 AM
hoy requested review of this revision.Jul 30 2021, 9:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 30 2021, 9:04 AM
hoy edited the summary of this revision. (Show Details)Jul 30 2021, 9:08 AM
hoy edited the summary of this revision. (Show Details)
hoy added reviewers: wmi, davidxl, wenlei, wlei.
hoy added inline comments.Jul 30 2021, 9:59 AM
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
222

The ordering based Hash in FuncToCtxtProfiles is mainly to achieve a consistent context promotion between md5 and non-md5 profiles which in turn gives a consistent codegen. However it is expansive. I tried sorting by the the combination of total sample counts and head sample counts, but still could not get every case covered. I think we might want to do this for non-md5 profile only, to favor md5 performance.

wenlei added inline comments.Jul 30 2021, 12:13 PM
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
222

On the cost of hashing strings, if we using a std::set<FunctionSamples *, ..>, with comparer that checks total samples order and md5 string order sequentially, we would have stable order, with low cost as md5 string order is just a fall back that's rarely used, right?

wmi added a comment.Jul 30 2021, 12:28 PM

Thanks for the work to reduce the CS profile size! It is something we really need.

I am also trying it in a slightly different way.
For "[main:3.1 @ _Z5funcBi]:120:19" in your example, I split the context string into multiple tuples of {string, line, discriminator}:
{main, 3, 1} {_Z5funcBi, 0, 0}, and for each name in the tuple, we will only save the index to the name table. So we will not have new entry in the name table for different contexts.

In this way, we won't have any increase in the name table section compared with non-CS profile even when we uses string based name. We do need to have a new section called CSNameTable to store the tuples. When we read the section, we will recontruct the context string from the tuple and the rest of the profile handling will have no change.

In this way, on top of it we can also apply md5 to the nametable using existing md5 mechanism, to further compress the name table section.

I havn't finished the implemention, but I am close to it. It will be good to discuss how we converge the effort here.

hoy added a comment.Jul 30 2021, 1:06 PM

Thanks for the work to reduce the CS profile size! It is something we really need.

I am also trying it in a slightly different way.
For "[main:3.1 @ _Z5funcBi]:120:19" in your example, I split the context string into multiple tuples of {string, line, discriminator}:
{main, 3, 1} {_Z5funcBi, 0, 0}, and for each name in the tuple, we will only save the index to the name table. So we will not have new entry in the name table for different contexts.

In this way, we won't have any increase in the name table section compared with non-CS profile even when we uses string based name. We do need to have a new section called CSNameTable to store the tuples. When we read the section, we will recontruct the context string from the tuple and the rest of the profile handling will have no change.

In this way, on top of it we can also apply md5 to the nametable using existing md5 mechanism, to further compress the name table section.

I havn't finished the implemention, but I am close to it. It will be good to discuss how we converge the effort here.

@wmi good to know you are also working on improving the efficiency. Thanks!
The string split design sounds interesting. I'm curious to know more about it, like how context strings are split in the profile, and how they are constructed and represented in the compiler.

Thanks for working on this.

Decomposing context strings into two levels is natural choice for deduplication. However, that means we would need to reconstruct strings in profile reader. So it's more like space vs speed thing. If compiler speed is the same, then storing decomposed strings in file and reconstruct them on profile loading might be a better choice as it's more compact and also more consistent with non-CS profile. Perhaps would be good to measure both.. Right now with md5 profile, plus some context trimming through cold threshold or pre-inliner tuning, we hope to bring e2e build time on par with AutoFDO for large workloads.

wmi added a comment.Aug 2 2021, 11:03 AM

Thanks for the work to reduce the CS profile size! It is something we really need.

I am also trying it in a slightly different way.
For "[main:3.1 @ _Z5funcBi]:120:19" in your example, I split the context string into multiple tuples of {string, line, discriminator}:
{main, 3, 1} {_Z5funcBi, 0, 0}, and for each name in the tuple, we will only save the index to the name table. So we will not have new entry in the name table for different contexts.

In this way, we won't have any increase in the name table section compared with non-CS profile even when we uses string based name. We do need to have a new section called CSNameTable to store the tuples. When we read the section, we will recontruct the context string from the tuple and the rest of the profile handling will have no change.

In this way, on top of it we can also apply md5 to the nametable using existing md5 mechanism, to further compress the name table section.

I havn't finished the implemention, but I am close to it. It will be good to discuss how we converge the effort here.

@wmi good to know you are also working on improving the efficiency. Thanks!
The string split design sounds interesting. I'm curious to know more about it, like how context strings are split in the profile, and how they are constructed and represented in the compiler.

Hi hongtao, this is the implementation: https://reviews.llvm.org/D107299. It splits the context to deduplicate the function names during profile writing, and reconstructs the context string during profile reading.

wmi added a comment.Aug 2 2021, 11:25 AM

Thanks for working on this.

Decomposing context strings into two levels is natural choice for deduplication. However, that means we would need to reconstruct strings in profile reader. So it's more like space vs speed thing. If compiler speed is the same, then storing decomposed strings in file and reconstruct them on profile loading might be a better choice as it's more compact and also more consistent with non-CS profile. Perhaps would be good to measure both.. Right now with md5 profile, plus some context trimming through cold threshold or pre-inliner tuning, we hope to bring e2e build time on par with AutoFDO for large workloads.

Right, D107173 in its own can already let me build our search benchmark in distributed way which I cannot do that before because of OOM issue, and that is really helpful. https://reviews.llvm.org/D107299 will still run into OOM in distributed build because we will still have the full context strings after profile reading, and that consumes excessive memory. For D107299, I believe that issue can also be solved after I add md5 support to it.

Like wenlei said, decomposing context strings can give us more compact profile (deduplicating: 418M vs md5: 1.8G for the profile collected from our search test), but reconstructing takes more time (2x). I am thinking about whether reconstructing the context string is necessary since the context represented in ContextTrieNode is also decomposed into multiple levels?

By the way, the context reconstructing time may also be reduced if D107299 can be coupled with md5.

hoy added a comment.Aug 2 2021, 12:18 PM

Thanks for working on this.

Decomposing context strings into two levels is natural choice for deduplication. However, that means we would need to reconstruct strings in profile reader. So it's more like space vs speed thing. If compiler speed is the same, then storing decomposed strings in file and reconstruct them on profile loading might be a better choice as it's more compact and also more consistent with non-CS profile. Perhaps would be good to measure both.. Right now with md5 profile, plus some context trimming through cold threshold or pre-inliner tuning, we hope to bring e2e build time on par with AutoFDO for large workloads.

Right, D107173 in its own can already let me build our search benchmark in distributed way which I cannot do that before because of OOM issue, and that is really helpful. https://reviews.llvm.org/D107299 will still run into OOM in distributed build because we will still have the full context strings after profile reading, and that consumes excessive memory. For D107299, I believe that issue can also be solved after I add md5 support to it.

Like wenlei said, decomposing context strings can give us more compact profile (deduplicating: 418M vs md5: 1.8G for the profile collected from our search test), but reconstructing takes more time (2x). I am thinking about whether reconstructing the context string is necessary since the context represented in ContextTrieNode is also decomposed into multiple levels?

By the way, the context reconstructing time may also be reduced if D107299 can be coupled with md5.

@wmi Thanks for sharing your change and the numbers! And thanks for trying my implementation. Interested have you seen build time improvement with it?

I think name split is a promising idea. It's worth giving it a try by not reconstructing full context strings in the compiler. Talked with Wenlei offline, instead of a stringRef field in SampleContext, which also serves as a key to uniquely identify the context, we could use something like vector<StringRef> instead. There are likely more related issues. Special handling might also be needed in some places when it comes to md5 profile, where not all md5 codes map to a real function in the current module. Let us now if you want to give it a shot.

wmi added a comment.Aug 2 2021, 2:03 PM

Thanks for working on this.

Decomposing context strings into two levels is natural choice for deduplication. However, that means we would need to reconstruct strings in profile reader. So it's more like space vs speed thing. If compiler speed is the same, then storing decomposed strings in file and reconstruct them on profile loading might be a better choice as it's more compact and also more consistent with non-CS profile. Perhaps would be good to measure both.. Right now with md5 profile, plus some context trimming through cold threshold or pre-inliner tuning, we hope to bring e2e build time on par with AutoFDO for large workloads.

Right, D107173 in its own can already let me build our search benchmark in distributed way which I cannot do that before because of OOM issue, and that is really helpful. https://reviews.llvm.org/D107299 will still run into OOM in distributed build because we will still have the full context strings after profile reading, and that consumes excessive memory. For D107299, I believe that issue can also be solved after I add md5 support to it.

Like wenlei said, decomposing context strings can give us more compact profile (deduplicating: 418M vs md5: 1.8G for the profile collected from our search test), but reconstructing takes more time (2x). I am thinking about whether reconstructing the context string is necessary since the context represented in ContextTrieNode is also decomposed into multiple levels?

By the way, the context reconstructing time may also be reduced if D107299 can be coupled with md5.

@wmi Thanks for sharing your change and the numbers! And thanks for trying my implementation. Interested have you seen build time improvement with it?

I didn't compare it with the case without md5 because I cannot build it successfully in our build system without md5. I tried building it locally but it took one week to finish.

I tried dumping the profile to text format and the md5 version took about a half time to finish compared with the case without md5, so that is an indication we should see build time improvement with it.

I think name split is a promising idea. It's worth giving it a try by not reconstructing full context strings in the compiler. Talked with Wenlei offline, instead of a stringRef field in SampleContext, which also serves as a key to uniquely identify the context, we could use something like vector<StringRef> instead. There are likely more related issues. Special handling might also be needed in some places when it comes to md5 profile, where not all md5 codes map to a real function in the current module. Let us now if you want to give it a shot.

Thanks for confirming the possibility of the idea to not reconstruct the full context string in the compiler. I will try the idea.

Thanks for working on this.

Decomposing context strings into two levels is natural choice for deduplication. However, that means we would need to reconstruct strings in profile reader. So it's more like space vs speed thing. If compiler speed is the same, then storing decomposed strings in file and reconstruct them on profile loading might be a better choice as it's more compact and also more consistent with non-CS profile. Perhaps would be good to measure both.. Right now with md5 profile, plus some context trimming through cold threshold or pre-inliner tuning, we hope to bring e2e build time on par with AutoFDO for large workloads.

Right, D107173 in its own can already let me build our search benchmark in distributed way which I cannot do that before because of OOM issue, and that is really helpful. https://reviews.llvm.org/D107299 will still run into OOM in distributed build because we will still have the full context strings after profile reading, and that consumes excessive memory. For D107299, I believe that issue can also be solved after I add md5 support to it.

Like wenlei said, decomposing context strings can give us more compact profile (deduplicating: 418M vs md5: 1.8G for the profile collected from our search test), but reconstructing takes more time (2x). I am thinking about whether reconstructing the context string is necessary since the context represented in ContextTrieNode is also decomposed into multiple levels?

By the way, the context reconstructing time may also be reduced if D107299 can be coupled with md5.

@wmi Thanks for sharing your change and the numbers! And thanks for trying my implementation. Interested have you seen build time improvement with it?

I didn't compare it with the case without md5 because I cannot build it successfully in our build system without md5. I tried building it locally but it took one week to finish.

I tried dumping the profile to text format and the md5 version took about a half time to finish compared with the case without md5, so that is an indication we should see build time improvement with it.

Curious were you trying probe-based CS profile or line-based CS profile?

I think name split is a promising idea. It's worth giving it a try by not reconstructing full context strings in the compiler. Talked with Wenlei offline, instead of a stringRef field in SampleContext, which also serves as a key to uniquely identify the context, we could use something like vector<StringRef> instead. There are likely more related issues. Special handling might also be needed in some places when it comes to md5 profile, where not all md5 codes map to a real function in the current module. Let us now if you want to give it a shot.

Thanks for confirming the possibility of the idea to not reconstruct the full context string in the compiler. I will try the idea.

Yeah, eliminate round-trip conversion is going to help. One of the reason we choose to store context string as a single string is trying to accommodate how profiles are being kept in profile reader/write using StringMap. If we store context strings in decomposed form in profile file, supporting corresponding decomposed context representation as first-class citizen inside compiler to avoid conversion would be great.

Now looking at the use of SampleProfileReader::getProfiles again, actually the key (name/context) string is redundant since FunctionSamples contains the function name/context too. StringMap is used for AFDO to help point look up by name. But for CSSPGO, look up is served by SampleContextTracker, and the tracker is built by iterating over all profiles, so full context string as key for the StringMap isn't really useful for CSSPGO, but more of a choice out of consistency.

If we want to skip reconstructing full context string from decomposed context input, we could change the string conversion for SampleContext to be a dummy string (e.g. MD5 string of context?) instead of reconstructed full context string, and use that as the key for StringMap profiles. Or we could change the interface for SampleProfileReader::getProfiles to return something like map<SampleContext, FunctionSamples>&., and we always use a SampleContext object for look up - the main content of SampleContext would be a StringRef (AFDO) or vector of StringRef (CSSPGO). We also need to be able to order SampleContext like before - that is used in loading a profile subtree to prepare for importing.

wmi added a comment.Aug 3 2021, 10:02 AM

Thanks for working on this.

Decomposing context strings into two levels is natural choice for deduplication. However, that means we would need to reconstruct strings in profile reader. So it's more like space vs speed thing. If compiler speed is the same, then storing decomposed strings in file and reconstruct them on profile loading might be a better choice as it's more compact and also more consistent with non-CS profile. Perhaps would be good to measure both.. Right now with md5 profile, plus some context trimming through cold threshold or pre-inliner tuning, we hope to bring e2e build time on par with AutoFDO for large workloads.

Right, D107173 in its own can already let me build our search benchmark in distributed way which I cannot do that before because of OOM issue, and that is really helpful. https://reviews.llvm.org/D107299 will still run into OOM in distributed build because we will still have the full context strings after profile reading, and that consumes excessive memory. For D107299, I believe that issue can also be solved after I add md5 support to it.

Like wenlei said, decomposing context strings can give us more compact profile (deduplicating: 418M vs md5: 1.8G for the profile collected from our search test), but reconstructing takes more time (2x). I am thinking about whether reconstructing the context string is necessary since the context represented in ContextTrieNode is also decomposed into multiple levels?

By the way, the context reconstructing time may also be reduced if D107299 can be coupled with md5.

@wmi Thanks for sharing your change and the numbers! And thanks for trying my implementation. Interested have you seen build time improvement with it?

I didn't compare it with the case without md5 because I cannot build it successfully in our build system without md5. I tried building it locally but it took one week to finish.

I tried dumping the profile to text format and the md5 version took about a half time to finish compared with the case without md5, so that is an indication we should see build time improvement with it.

Curious were you trying probe-based CS profile or line-based CS profile?

I was trying probe-based CS profile.

I think name split is a promising idea. It's worth giving it a try by not reconstructing full context strings in the compiler. Talked with Wenlei offline, instead of a stringRef field in SampleContext, which also serves as a key to uniquely identify the context, we could use something like vector<StringRef> instead. There are likely more related issues. Special handling might also be needed in some places when it comes to md5 profile, where not all md5 codes map to a real function in the current module. Let us now if you want to give it a shot.

Thanks for confirming the possibility of the idea to not reconstruct the full context string in the compiler. I will try the idea.

Yeah, eliminate round-trip conversion is going to help. One of the reason we choose to store context string as a single string is trying to accommodate how profiles are being kept in profile reader/write using StringMap. If we store context strings in decomposed form in profile file, supporting corresponding decomposed context representation as first-class citizen inside compiler to avoid conversion would be great.

Now looking at the use of SampleProfileReader::getProfiles again, actually the key (name/context) string is redundant since FunctionSamples contains the function name/context too. StringMap is used for AFDO to help point look up by name. But for CSSPGO, look up is served by SampleContextTracker, and the tracker is built by iterating over all profiles, so full context string as key for the StringMap isn't really useful for CSSPGO, but more of a choice out of consistency.

If we want to skip reconstructing full context string from decomposed context input, we could change the string conversion for SampleContext to be a dummy string (e.g. MD5 string of context?) instead of reconstructed full context string, and use that as the key for StringMap profiles. Or we could change the interface for SampleProfileReader::getProfiles to return something like map<SampleContext, FunctionSamples>&., and we always use a SampleContext object for look up - the main content of SampleContext would be a StringRef (AFDO) or vector of StringRef (CSSPGO). We also need to be able to order SampleContext like before - that is used in loading a profile subtree to prepare for importing.

Thanks for the suggestion. That is very helpful.

wmi added a comment.Aug 4 2021, 9:35 PM

I tried the idea

Thanks for working on this.

Decomposing context strings into two levels is natural choice for deduplication. However, that means we would need to reconstruct strings in profile reader. So it's more like space vs speed thing. If compiler speed is the same, then storing decomposed strings in file and reconstruct them on profile loading might be a better choice as it's more compact and also more consistent with non-CS profile. Perhaps would be good to measure both.. Right now with md5 profile, plus some context trimming through cold threshold or pre-inliner tuning, we hope to bring e2e build time on par with AutoFDO for large workloads.

Right, D107173 in its own can already let me build our search benchmark in distributed way which I cannot do that before because of OOM issue, and that is really helpful. https://reviews.llvm.org/D107299 will still run into OOM in distributed build because we will still have the full context strings after profile reading, and that consumes excessive memory. For D107299, I believe that issue can also be solved after I add md5 support to it.

Like wenlei said, decomposing context strings can give us more compact profile (deduplicating: 418M vs md5: 1.8G for the profile collected from our search test), but reconstructing takes more time (2x). I am thinking about whether reconstructing the context string is necessary since the context represented in ContextTrieNode is also decomposed into multiple levels?

By the way, the context reconstructing time may also be reduced if D107299 can be coupled with md5.

@wmi Thanks for sharing your change and the numbers! And thanks for trying my implementation. Interested have you seen build time improvement with it?

I didn't compare it with the case without md5 because I cannot build it successfully in our build system without md5. I tried building it locally but it took one week to finish.

I tried dumping the profile to text format and the md5 version took about a half time to finish compared with the case without md5, so that is an indication we should see build time improvement with it.

I think name split is a promising idea. It's worth giving it a try by not reconstructing full context strings in the compiler. Talked with Wenlei offline, instead of a stringRef field in SampleContext, which also serves as a key to uniquely identify the context, we could use something like vector<StringRef> instead. There are likely more related issues. Special handling might also be needed in some places when it comes to md5 profile, where not all md5 codes map to a real function in the current module. Let us now if you want to give it a shot.

Thanks for confirming the possibility of the idea to not reconstruct the full context string in the compiler. I will try the idea.

I tried the idea of using vector<Callsite> (Callsite is a struct containing function name and LineLocation) in SampleContext and changed the ProfileMap to be map<SampleContext, FunctionSamples> in the last two days. I believed the idea should work but I found the required change was much larger than I thought since ProfileMap has been used in many interfaces. I am worried that for the moment I cannot devote enough time to finish the whole change, so now I tend to use Hongtao's md5 patch and leave the further improvement to the future since the current patch has already significantly improved the build process and unblocked our experiment. We can always come back to revisit it. What is your opinion?

hoy added a comment.Aug 5 2021, 11:47 AM

I tried the idea

Thanks for working on this.

Decomposing context strings into two levels is natural choice for deduplication. However, that means we would need to reconstruct strings in profile reader. So it's more like space vs speed thing. If compiler speed is the same, then storing decomposed strings in file and reconstruct them on profile loading might be a better choice as it's more compact and also more consistent with non-CS profile. Perhaps would be good to measure both.. Right now with md5 profile, plus some context trimming through cold threshold or pre-inliner tuning, we hope to bring e2e build time on par with AutoFDO for large workloads.

Right, D107173 in its own can already let me build our search benchmark in distributed way which I cannot do that before because of OOM issue, and that is really helpful. https://reviews.llvm.org/D107299 will still run into OOM in distributed build because we will still have the full context strings after profile reading, and that consumes excessive memory. For D107299, I believe that issue can also be solved after I add md5 support to it.

Like wenlei said, decomposing context strings can give us more compact profile (deduplicating: 418M vs md5: 1.8G for the profile collected from our search test), but reconstructing takes more time (2x). I am thinking about whether reconstructing the context string is necessary since the context represented in ContextTrieNode is also decomposed into multiple levels?

By the way, the context reconstructing time may also be reduced if D107299 can be coupled with md5.

@wmi Thanks for sharing your change and the numbers! And thanks for trying my implementation. Interested have you seen build time improvement with it?

I didn't compare it with the case without md5 because I cannot build it successfully in our build system without md5. I tried building it locally but it took one week to finish.

I tried dumping the profile to text format and the md5 version took about a half time to finish compared with the case without md5, so that is an indication we should see build time improvement with it.

I think name split is a promising idea. It's worth giving it a try by not reconstructing full context strings in the compiler. Talked with Wenlei offline, instead of a stringRef field in SampleContext, which also serves as a key to uniquely identify the context, we could use something like vector<StringRef> instead. There are likely more related issues. Special handling might also be needed in some places when it comes to md5 profile, where not all md5 codes map to a real function in the current module. Let us now if you want to give it a shot.

Thanks for confirming the possibility of the idea to not reconstruct the full context string in the compiler. I will try the idea.

I tried the idea of using vector<Callsite> (Callsite is a struct containing function name and LineLocation) in SampleContext and changed the ProfileMap to be map<SampleContext, FunctionSamples> in the last two days. I believed the idea should work but I found the required change was much larger than I thought since ProfileMap has been used in many interfaces. I am worried that for the moment I cannot devote enough time to finish the whole change, so now I tend to use Hongtao's md5 patch and leave the further improvement to the future since the current patch has already significantly improved the build process and unblocked our experiment. We can always come back to revisit it. What is your opinion?

Thanks for the heads-up, Wei. Sounds good to stick with this patch for now, if you need more time to complete your idea which I think will get us a bigger win and be more aligned with the non-CS implementation. Will you continue working on that?

wmi added a comment.Aug 5 2021, 12:29 PM

I tried the idea of using vector<Callsite> (Callsite is a struct containing function name and LineLocation) in SampleContext and changed the ProfileMap to be map<SampleContext, FunctionSamples> in the last two days. I believed the idea should work but I found the required change was much larger than I thought since ProfileMap has been used in many interfaces. I am worried that for the moment I cannot devote enough time to finish the whole change, so now I tend to use Hongtao's md5 patch and leave the further improvement to the future since the current patch has already significantly improved the build process and unblocked our experiment. We can always come back to revisit it. What is your opinion?

Thanks for the heads-up, Wei. Sounds good to stick with this patch for now, if you need more time to complete your idea which I think will get us a bigger win and be more aligned with the non-CS implementation. Will you continue working on that?

Probably I cannot work on it this quarter, so if you like to work on it, free to take D107299 and I will be happy to review it. If you think getting this patch in as a short term solution is preferred, I can start reviewing it.

hoy added a comment.Aug 10 2021, 2:43 PM

I tried the idea of using vector<Callsite> (Callsite is a struct containing function name and LineLocation) in SampleContext and changed the ProfileMap to be map<SampleContext, FunctionSamples> in the last two days. I believed the idea should work but I found the required change was much larger than I thought since ProfileMap has been used in many interfaces. I am worried that for the moment I cannot devote enough time to finish the whole change, so now I tend to use Hongtao's md5 patch and leave the further improvement to the future since the current patch has already significantly improved the build process and unblocked our experiment. We can always come back to revisit it. What is your opinion?

Thanks for the heads-up, Wei. Sounds good to stick with this patch for now, if you need more time to complete your idea which I think will get us a bigger win and be more aligned with the non-CS implementation. Will you continue working on that?

Probably I cannot work on it this quarter, so if you like to work on it, free to take D107299 and I will be happy to review it. If you think getting this patch in as a short term solution is preferred, I can start reviewing it.

Thanks for the heads-up @wmi . I will take your current change and work from there.

I tried the idea of using vector<Callsite> (Callsite is a struct containing function name and LineLocation) in SampleContext and changed the ProfileMap to be map<SampleContext, FunctionSamples> in the last two days. I believed the idea should work but I found the required change was much larger than I thought since ProfileMap has been used in many interfaces. I am worried that for the moment I cannot devote enough time to finish the whole change, so now I tend to use Hongtao's md5 patch and leave the further improvement to the future since the current patch has already significantly improved the build process and unblocked our experiment. We can always come back to revisit it. What is your opinion?

Thanks for the heads-up, Wei. Sounds good to stick with this patch for now, if you need more time to complete your idea which I think will get us a bigger win and be more aligned with the non-CS implementation. Will you continue working on that?

Probably I cannot work on it this quarter, so if you like to work on it, free to take D107299 and I will be happy to review it. If you think getting this patch in as a short term solution is preferred, I can start reviewing it.

Thanks for the heads-up @wmi . I will take your current change and work from there.

Sounds good. split names could be a better solution. Btw, I suggest we separate out the context tracker change to use md5 - it should be orthogonal.

hoy abandoned this revision.Aug 31 2021, 4:22 PM