This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie
ClosedPublic

Authored by wlei on Jun 3 2022, 4:08 PM.

Details

Summary

This is the followup patch to https://reviews.llvm.org/D125246 for the SampleContextTracker part. Before the promotion and merging of the context is based on the SampleContext(the array of frame), this causes a lot of cost to the memory. This patch detaches the tracker from using the array ref instead to use the context trie itself. This can save a lot of memory usage and benefit both the compiler's CS inliner and llvm-profgen's pre-inliner.

One structure needs to be specially treated is the FuncToCtxtProfiles, this is used to get all the functionSamples for one function to do the merging and promoting. Before it search each functions' context and traverse the trie to get the node of the context. Now we don't have the context inside the profile, instead we directly use an auxiliary map ProfileToNodeMap for profile , it initialize to create the FunctionSamples to TrieNode relations and keep updating it during promoting and merging the node.

Moreover, I was expecting the results before and after remain the same, but I found that the order of FuncToCtxtProfiles matter and affect the results. This can happen on recursive context case, but the difference should be small. Now we don't have the context, so I just used a vector for the order, the result is still deterministic.

Measured on one huge size(12GB) profile from one of our internal service. The profile similarity difference is 99.999%, and the running time is improved by 3X(debug mode) and the memory is reduced from 170GB to 90GB.

Diff Detail

Event Timeline

wlei created this revision.Jun 3 2022, 4:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2022, 4:08 PM
wlei requested review of this revision.Jun 3 2022, 4:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2022, 4:08 PM
wlei updated this revision to Diff 434218.Jun 3 2022, 5:49 PM

Updating D127031: [CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie

wlei edited the summary of this revision. (Show Details)Jun 3 2022, 5:49 PM
wlei added reviewers: hoy, wenlei.
hoy added a comment.Jun 6 2022, 10:50 AM

Would be good to have some numbers for build time and memory usage in the summary.

llvm/include/llvm/ProfileData/SampleProf.h
688

The field is only useful during context manipulation. Is it possible to store the link in an auxiliary map whereever it is used?

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
93

may still be helpful to keep the debug dump?

219

Is this also done in populateFuncToCtxtMap?

452

nit: use SampleContextFrameVector

wenlei added inline comments.Jun 6 2022, 11:37 PM
llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
106

Is std::set unnecessary or is it wrong?

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
372

Why we need to check MergedContext on OldContext, but InlinedContext on ContextOnNode?

454

return string();

wlei added inline comments.Jun 9 2022, 9:21 PM
llvm/include/llvm/ProfileData/SampleProf.h
688

I think we use context for two things 1)context manipulation(inlining) 2) reader and writer based on ProfileMap.
My thought is ContextNode is the same thing to FullContext, i,e. we can always start from this node to iterate trie reversely to get the array of frames. I was wondering if we can use ContextNode to replace the FullContext in the future. And for sample reader and writer things, the key of the map can work as the FullContext, we might not need this field either. That's why I'm thinking to keep it here. WDYT?

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
106

Before it sorted the elements by A->getContext() < B->getContext();, but now getContext is an empty context, so it ends up with non-deterministic issue and losing elements. While we can recover the context by waling through the trie, but thinking using SmallVector is enough here, the order of traversing trie is deterministic(getAllChildContext() is deterministic).

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
93

moveToChildContext is the function of ContextTrieNode, the getContextString is in SampleContextTracker. Then it requires to pass SampleContextTracker to the moveToChildContext function.

I just realize before it dump the promotion result for each node, but now we don't have the concept of promotion. Maybe dumping the the head node(ToNode) is enough?

219

Good catch!

372

Let me demo by an example.

Given two node with same name foo, say N1 and N2. N1 own a FunctionSample S1, and N2 own S2, i, e. S1->N1, S2->N2

Supposing during merge bar, foo is also merge, say N2 --> N1, then S2's context is marked MergedContext and S2's is update to point to N1. S1 still point to N1.

Then when call getBaseSample for foo to merge all nodes, S2 and S1 both point to N1 now, we can just skip S2 which contains the OldContext marked MergedContext .

452

Done, thanks!

454

Done, thanks!

wlei edited the summary of this revision. (Show Details)Jun 9 2022, 9:28 PM
wlei updated this revision to Diff 435794.Jun 9 2022, 9:29 PM
wlei edited the summary of this revision. (Show Details)

Addressing reviewers' feedback. Also added the build time and memory usage number in the summary.

hoy added inline comments.Jun 9 2022, 11:39 PM
llvm/include/llvm/ProfileData/SampleProf.h
688

yeah, I had the same thought that the two fields are kinda redundant to each other. I'm a bit worried about the compile-time memory increase. Would it be possible to use a union for the two fields for now?

  1. reader and writer based on ProfileMap.

What is this usage?

I was wondering if we can use ContextNode to replace the FullContext in the future.

It depends. For example, the profile merger probably does not want to always build a tri and merge multiple tries. The profile similarity checker may neither.

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
93

Yeah, I don't have a strong opinion, as long as it makes your debug experience better :)

wlei updated this revision to Diff 436104.Jun 10 2022, 9:50 PM

Make a union type for fullcontext(array) and context trie node.

wlei added inline comments.Jun 10 2022, 9:55 PM
llvm/include/llvm/ProfileData/SampleProf.h
688

Would it be possible to use a union for the two fields for now?

Yeah, that's good to save memory. I had a try this, see the current version, it can pass all the test. However, I feel like it might be error-prone. The two shared fields have different way of initialization, currently I need to make sure SampleContextFrames is initialized then the ContextTrieNode field will be unknown status(not null). We might need to carefully use it in the future.

reader and writer based on ProfileMap.
What is this usage?

For CS profile, SampleContext is the key of ProfileMap, FullContext is always not empty here.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/ProfileData/SampleProfReader.cpp#L279

It depends. For example, the profile merger probably does not want to always build a tri and merge multiple tries. The profile similarity checker may neither.

Yeah, right now ProfileMap always require the FullContext as the key. wondering if we can only use FullContext as the key, FunctionSample's SampleContext doesn't own the fullContext.

hoy added inline comments.Jun 11 2022, 10:51 AM
llvm/include/llvm/ProfileData/SampleProf.h
688

Using union is error-prone, yes. My original thought is to use a separate map in ContextTracker that maintains the link from FunctionSamples to its ContextTrieNode. So far I don't see a use of getting ContextTrieNode from FunctionSamples out of the ContextTracker context. It seems encapsulating the map inside ContextTracker is enough. Does this make sense to you?

llvm/tools/llvm-profgen/ProfileGenerator.cpp
1021

What are the allocations to be undone? I thought in D125246 we use std:move to transmit the FunctionSamples on the tri to profileMap.

wlei updated this revision to Diff 436453.Jun 13 2022, 9:59 AM
wlei added a comment.Jun 13 2022, 9:59 AM

Updating D127031: [CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie

llvm/include/llvm/ProfileData/SampleProf.h
688

I see, now I understand why to keep in a separate map, it's less intrusive to the FunctionSamples class. Thanks for your clarification.

llvm/tools/llvm-profgen/ProfileGenerator.cpp
1021

I was thinking the std::move 's source object should be freed, it's allocated in CSProfileGenerator::getOrCreateFunctionSamples and it's allocated by new, so we need explicitly delete, though the memory usage should be small.

hoy added a comment.Jun 13 2022, 11:47 AM

A general question about the design as I read more into the code. I was trying to figure out if a FunctionSample, its SampleContext and ContextTrieNode stick together as a whole, as also commented in another place. It seems to me multiple FunctionSamples can share the ContextTrieNode. Am I correct? What is the reason for this?

llvm/include/llvm/ProfileData/SampleProf.h
688

Yeah, it looks better now. Thanks for addressing this.

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
178

type: are -> is

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
364

Should we also check InlinedContext here?

369

I'm a bit confused here. Is ContextOnNode same with OldContext? Should a FunctionSample, its SampleContext and ContextTrieNode stick together as a whole? It may be worth a comment here.

wlei added inline comments.Jun 13 2022, 11:59 AM
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
364

Answering in the comment below.

369

Supposing Node(NA) owned a FunctionSamples(SA) and NB owns SB, NA is merged into NB.

SA's context is marked MergedContext and Now SA's node is pointed to NB(NA is removed)

so here OldContext is the context of NA, ContextOnNode is the context of NB(might be marked InlinedContext in the future).

will comment using this example.

Should a FunctionSample, its SampleContext and ContextTrieNode stick together as a whole

It seems they can't be sticked together if we want to keep the old status(MergedContext) and the node pointer always point to the new on trie node.

hoy added inline comments.Jun 13 2022, 12:02 PM
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
369

SA's context is marked MergedContext and Now SA's node is pointed to NB(NA is removed)

Why SA's should point to NB? Can NA be kept so that SA always point to it?

When a context is merged to another context, the context should never be used anywhere. What's the benefit of making it point to the other tri node?

wlei added inline comments.Jun 13 2022, 12:31 PM
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
369

We need to update FSamples to always point to the new ContextTrieNode, that's why we use updateContextNode. This is because we iterate the FuncToCtxtProfiles which is a list of FSamples to be merged to get base sample. Since the the node can be merged dynamically, take the previous example. the NB after merging is the sum of (SA + SB), NA is the old SA, it should be removed. but when to get FA's Node, if we get the old SA, it will be merged to NA again becoming (2*SA +SB). so we need to always make point to the node on the trie. And now both SA and SB are pointing to NB, then here use the oldcontext(MergedContext ) to filter out one.

hoy added inline comments.Jun 13 2022, 1:39 PM
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
369

but when to get FA's Node, if we get the old SA, it will be merged to NA again becoming (2*SA +SB).

At this time. SA's context should be marked "MergedContext". Can we skip it based on the flag?

wlei added inline comments.Jun 13 2022, 1:44 PM
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
369

Yeah, that's just done here, that's to skip the FSample(SA) whose old context(NA) is marked MergedContext .

hoy added inline comments.Jun 13 2022, 1:46 PM
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
369

So do we need to follow SA's link to its context tri node? Feel we wouldn't never need to access SA anymore since all of its data are already merged into another node.

hoy added inline comments.Jun 13 2022, 1:48 PM
llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
187

Actually my previous question could be made more clear as why we need to maintain this map.

wlei added inline comments.Jun 13 2022, 3:35 PM
llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
187

I see, so you're suggested that
1 FSamples2NodeMap make sure FSamples always pointed to its belonging context node.
2 even the FSamples whose context is mergedcontext point to a stale trie node, we can filter out by the

SampleContext &OldContext = CSamples->getContext();
// Check to avoid re-merge any context.
if (OldContext.hasState(MergedContext))

So we won't have a redundant node to be merged. We also don't need to update FSample to point to the merged node.

That sounds a clear way!

hoy added inline comments.Jun 13 2022, 3:44 PM
llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
187

Exactly. Hopefully the logic can be simplified a little bit.

wlei updated this revision to Diff 436610.Jun 13 2022, 5:23 PM

addressing feedback from Hongtao

hoy added inline comments.Jun 13 2022, 9:04 PM
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
369

ContextOnNode and OldContext should be the same thing now?

592

This is no longer needed?

llvm/tools/llvm-profgen/CSPreInliner.cpp
311

Should we just keep the original condition checks?

Also, where do we actually deallocate the function profile?

llvm/tools/llvm-profgen/ProfiledBinary.h
164

nit: use const for the parameter

wlei updated this revision to Diff 436652.Jun 13 2022, 10:22 PM

Updating D127031: [CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
369

Yeah, new the context node is always point to the FunctionSamples, and vice versa. Recovered the original code.

592

This was to be consistent with the branch below(line:608), I just realized that actually the FromNode will be removed from the trie, they both can be removed, removing them.

llvm/tools/llvm-profgen/CSPreInliner.cpp
311

Should we just keep the original condition checks?

do you mean the isBaseContext?, it depends on the FullContext, now we don't have it, so I changed to Node->getParentContext() != &ContextTracker.getRootContext()

Also, where do we actually deallocate the function profile?

It's a bit tricky, the FunctionSample can be from two sources

  1. the llvm-sample-profile, the FunctionSamples's lifetime will be along with the reader. we can't explicitly call delete on this.
  1. ProfileGenerator( by the new), we need to explicitly call delete, I haven't came up with a way to deallocate this yet. The Node is removed from trie dynamically, I was thinking to use FuncToCtxtProfiles to get all FunctionSamples's pointer but this doesn't include the base context.

    (I leave a TODO for this)
llvm/tools/llvm-profgen/ProfiledBinary.h
164

Done!

hoy added inline comments.Jun 13 2022, 11:29 PM
llvm/tools/llvm-profgen/CSPreInliner.cpp
311

it depends on the FullContext, now we don't have it, so I changed to Node->getParentContext() != &ContextTracker.getRootContext()

I see. That makes sense.

It's a bit tricky, the FunctionSample can be from two sources

It is tricky.

Do we have any memory leakage by resetting the pointer to null but not freeing the data? If so, we may fail the sanitizer tests.

wlei added inline comments.Jun 13 2022, 11:50 PM
llvm/tools/llvm-profgen/CSPreInliner.cpp
311

Okay.. The FuncToCtxtProfiles also hold the pointers but eventually after ContextTracker is freed, it's still memory leakage.. let me try to free them.

wlei updated this revision to Diff 436824.Jun 14 2022, 9:33 AM

free memory allocated by llvm-profgen

hoy added inline comments.Jun 14 2022, 12:12 PM
llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
150

PreInliner is a llvm-profgen concept, thus it should not be exposed here.

Also the deallocation should be done by the original allocator, i.e, llvm-profgen.

llvm/tools/llvm-profgen/ProfileGenerator.cpp
1031

I'm wondering if these two deallocations can be combined. Maybe in buildProfileMap we could exclude merged contexts there and then we don't need to do Node->setFunctionSamples(nullptr); inside CSPreInliner::run?

wlei added inline comments.Jun 14 2022, 12:29 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
1031

The merged contexts node is removed from the trie after merging, so we can't traverse the whole trie to delete all the FSamples. That's why I was thinking to leverage the FuncToCtxtProfiles which stores all the FSample pointers at the beginning. And if it's not the preinliner pass, FuncToCtxtProfiles is empty, then we need to traverse the trie to delete them.

wlei updated this revision to Diff 436965.Jun 14 2022, 3:53 PM

move memory free code inside llvm-profgen

hoy added inline comments.Jun 14 2022, 5:15 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
1031

The merged contexts node is removed from the trie after merging, so we can't traverse the whole trie to delete all the FSamples.

I see. I guess we have to keep the diversion for now. A shared_ptr or a centralized FunctionSample storage such as what the reader maintains would typically help. That would require a change in CSProfileGenerator::getOrCreateFunctionSamples . Please leave a todo somewhere to be addressed laster.

wlei added inline comments.Jun 14 2022, 6:22 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
1031

Good idea, I feel that's something feasible in this patch. Based on your idea to change getOrCreateFunctionSamples , I'm wondering if we can just use a vector to store all the FunctionSamples, the trie will naturally make sure the node is unique, so we don't need a Map which is our concern before.

hoy added inline comments.Jun 14 2022, 6:52 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
1031

Yeah, a std::list would work. std::vector can be volatile when growing.

wlei added inline comments.Jun 14 2022, 6:56 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
1031

Yeah, thanks for the suggestion! I will implement it in https://reviews.llvm.org/D125246, that's the first diff
introduce the memory leak.

wlei updated this revision to Diff 437028.Jun 14 2022, 9:04 PM

rebase, memory leak issue is addressed in the https://reviews.llvm.org/D125246

hoy added a comment.Jun 14 2022, 11:09 PM

Thanks for addressing all the comments! Looks good to me now except for a couple minor things.

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
93–95

Wondering if 16 can be larger than needed in general. Since we are aiming for mem savings, maybe a smaller value is better? Or just use std::vector. Shouldn't be a big deal though.

150

Make this debug-only?

wlei updated this revision to Diff 437046.Jun 14 2022, 11:42 PM

Updating D127031: [CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
93–95

Sounds good, used std::vector

150

Done, thanks!

hoy accepted this revision.Jun 15 2022, 9:26 AM

LGTM.

This revision is now accepted and ready to land.Jun 15 2022, 9:26 AM
wenlei added inline comments.Jun 23 2022, 5:54 PM
llvm/include/llvm/ProfileData/SampleProf.h
688

I also like this being an auxiliary data rather than extending with a field. Thanks for working on this

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
93–94

nit: this comment is more like an explanation of the change (why we no longer need container to order elements), based on previous comment as context. but reader will never have previous comment/code as context.

Saying this because when I read this comment, I actually got confused, and I only understand it after reading the original code..

179

nit: to be consistent with FuncToCtxtProfiles, name this ProfileToNodeMap

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
69

DeleteNode is always false and can be simplified?

87

any reason to make this a member of SampleContextTracker rather than member of ContextTrieNode? asking because the naming moveToChildContext is assuming this is the parent to move to.

227

I'm seeing many repetition of tree BFS and the boiler plate code. Perhaps it makes sense to create iterator API for SampleContextTracker to go over all nodes.

llvm/tools/llvm-profgen/CSPreInliner.cpp
65

populateFuncToCtxtMap isn't really the responsibility of preinliner. can we move this earlier to right after profile generation on trie?

preinliner should be one client of context tracker and it can assume complete, fully functioning context tracker, including auxiliary maps.

wenlei added inline comments.Jun 23 2022, 5:59 PM
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
87

Ok, I guess it's because setContextNode needs to update the map for context tracker. In that case, we probably want to make context tracker own the move completely, because ContextTrieNode::moveToChildContext now isn't complete without updating the map. What I'm suggesting is to move ContextTrieNode::moveToChildContext into this function completely.

Also rename SampleContextTracker::moveToChildContext to something like ContextTrieNode::moveContextSamples since this is no longer the parent to move to.

wlei updated this revision to Diff 440449.Jun 27 2022, 5:51 PM

Addressing Wenlei's feeback: add an iterator API for SampleContextTracker and other refactors.

Thanks for the change, and the iterator cleans up quite a bit of duplication. There's one more case left, see inline comment.

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
100

We can use the iterator for this as well?

wlei added inline comments.Jun 27 2022, 7:32 PM
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
100

We can use iterator for this. But we need to call setParentContext to set Parent for all childNode, there is still the below for loop inside, it's not like others very straightforward good for the readability. WDYT?

for(.....) {
   ....
  for (auto &It : Node->getAllChildContext()) {
    ContextTrieNode *ChildNode = &It.second;
    ChildNode->setParentContext(Node);
 }
wenlei accepted this revision.Jun 27 2022, 10:08 PM

lgtm, thanks.

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
100

Ok, then it works either way. Keep current version is also good because this loop involves modification.

wlei edited the summary of this revision. (Show Details)Jun 27 2022, 11:18 PM
This revision was landed with ongoing or failed builds.Jun 27 2022, 11:30 PM
This revision was automatically updated to reflect the committed changes.