This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
IR/
3/4
ModuleSummaryIndex.h
-
Transforms/IPO/
-
IPO/
-
PGHOContextDisambiguation.h
-
lib/
-
LTO/
-
LTO.cpp
-
Passes/
-
PassBuilder.cpp
-
PassBuilderPipelines.cpp
-
PassRegistry.def
-
Transforms/IPO/
-
IPO/
-
CMakeLists.txt
23/26
PGHOContextDisambiguation.cpp
-
test/
-
ThinLTO/X86/
-
X86/
-
pgho-basic.ll
-
pgho-duplicate-context-ids.ll
-
pgho-indirectcall.ll
-
pgho-inlined.ll
-
Transforms/PGHOContextDisambiguation/
-
PGHOContextDisambiguation/
1/2
basic.ll
-
duplicate-context-ids.ll
-
indirectcall.ll
-
inlined.ll
-
pass-pipeline.ll

Differential D140908

[MemProf] Context disambiguation cloning pass [patch 1a/3]
ClosedPublic

Authored by tejohnson on Jan 3 2023, 10:01 AM.

Download Raw Diff

Details

Reviewers

snehasish
davidxl

Commits

rG700cd99061ed: Restore "[MemProf] Context disambiguation cloning pass [patch 1a/3]"
rGd6ad4f01c3da: [MemProf] Context disambiguation cloning pass [patch 1a/3]

Summary

Support for building, printing, and displaying CallsiteContextGraph
which represents the MemProf metadata contexts. Uses CRTP to enable
support for both IR (regular LTO) and summary (ThinLTO). This patch
includes the support for building it in regular LTO mode (from
memprof and callsite metadata), and the next patch will add the
handling for building it from ThinLTO summaries.

Also includes support for dumping the graph to text and to dot files.

Follow-on patches will contain the support for cloning on the graph and
in the IR.

The graph represents the call contexts in all memprof metadata on
allocation calls, with nodes for the allocations themselves, as well as
for the calls in each context. The graph is initially built from the
allocation memprof metadata (or summary) MIBs. It is then updated to
match calls with callsite metadata onto the nodes, updating it to
reflect any inlining performed on those calls.

Each MIB (representing an allocation's call context with allocation
behavior) is assigned a unique context id during the graph build. The
edges and nodes in the graph are decorated with the context ids they
carry. This is used to correctly update the graph when cloning is
performed so that we can uniquify the context for a single (possibly
cloned) allocation.

Depends on D140786.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,030 ms	x64 debian > libFuzzer.libFuzzer::minimize_crash.test

Event Timeline

tejohnson created this revision.Jan 3 2023, 10:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 3 2023, 10:01 AM

Herald added subscribers: ormris, arphaman, steven_wu and 2 others. · View Herald Transcript

tejohnson requested review of this revision.Jan 3 2023, 10:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 3 2023, 10:01 AM

Examples of the dot files (for new test llvm/test/Transforms/PGHOContextDisambiguation/basic.ll), also converted to png format, are attached:
basic.ll.tmp.ccg.prestackupdate.dot[.png] is the dot file generated after adding all allocation nodes, and the stack nodes corresponding to the memprof metadata on those allocations.
basic.ll.tmp.ccg.postbuild.dot[.png] is the dot file generated after matching callsite metadata from other calls to the stack nodes.

The color scheme on the nodes/edges: blue = cold, red = noncold, purple = both
In the dot file format there is a tooltip that shows the assigned context ids for each node and edge.

basic.ll.tmp.ccg.prestackupdate.dot1 KBDownload

basic.ll.tmp.ccg.postbuild.dot1 KBDownload

Harbormaster completed remote builds in B205487: Diff 486018.Jan 3 2023, 10:50 AM

tejohnson added a child revision: D140949: [MemProf] Context disambiguation cloning pass [patch 2/3].Jan 3 2023, 8:25 PM

davidxl added inline comments.Jan 3 2023, 10:17 PM

llvm/include/llvm/ADT/SetOperations.h
60 ↗	(On Diff #486018)	Keep the name of the interface 'set_intersect'. To allow overloading, make the return a reference parameter.
93 ↗	(On Diff #486018)	nit: it feels better to keep the name of the interface set_subtract. For symmetry, make the 'removed' a reference parameter.
llvm/include/llvm/IR/ModuleSummaryIndex.h
318	dumping utilities can be split out. Similarly for new interfaces like 'last', or set_subtract .
llvm/test/Transforms/PGHOContextDisambiguation/basic.ll
30	why is the option called '-lifetime-cold-'? should it be '-lifetime-short-'?

Enna1 added a subscriber: Enna1.Jan 8 2023, 6:59 PM

Update with a couple of fixes made when testing on a large target.

I will add tests for the fixes, but wanted to commit the code changes
right away for the review.

One of the fixes necessitated pulling the removed edges handling up from
the follow on patch and expanding it. There is another fix to avoid
infinite recursion and a couple other minor fixes.

Still reviewing, here are some initial comments.

llvm/include/llvm/ADT/SetOperations.h
60 ↗	(On Diff #486018)	If we make the return a reference (eg. `S1Ty&`) this will break since we are returning a temporary. Returning `const S1Ty&` would work due to reference lifetime extension but constrains what we can do with the result.
llvm/include/llvm/Analysis/MemoryProfileInfo.h
131 ↗	(On Diff #486018)	nit: call this back() to be consistent with iterator method names?
llvm/include/llvm/IR/ModuleSummaryIndex.h
318	+1
992	The behaviour of the const and mutable version of the callsites function differ. In `callsites` if its null it returns an empty array ref, for mutableCallsites it may assert (or segfault on an opt build). Should we avoid the assert by allocating storage instead?
llvm/lib/Transforms/IPO/PGHOContextDisambiguation.cpp
2	So far we've used memprof to refer to all of the prior work, I would strongly prefer we continue using the same to make it easy to discover all the related code by just searching for a single term. This is also useful for listing all the options related to memprof which start with the "memprof-" prefix. Wdyt?
60	nit: Expand CCG to CallingContextGraph here and below.
69	Can we move this to MemoryProfileInfo.h and use it here as well as MemoryProfileInfo.cpp where it exists as a static function?
123	I guess inheriting from std::pair provides equality operators etc? Is there any other reason to derive from std::pair as compared to just defining the members inline. nit: struct FuncInfo final (same for CallInfo below)
143	nit: IMO `operator bool()` is less cryptic than `(bool)*this`
873	This lambda is doing a lot of work, can me move this out to its own method? It's a little hard to read right now because of the size and doing so would make it a little easier to read.
897	auto& to avoid a copy?
1103	Twine(CloneNo) should incur fewer conversions. "pgho" suffix would change if you consider the comment about pgho/memprof above.
1213	Can we split this patch to only implement the basic version (non-ThinLTO)?

Harbormaster completed remote builds in B206879: Diff 487920.Jan 10 2023, 3:46 PM

I'll move on to the other patches in the stack and then come back to this one, thanks for your patience!

llvm/lib/Transforms/IPO/PGHOContextDisambiguation.cpp
916	nit: s/reverse/descending?
957	I didn't understand how we can have this case with different MIB contexts here. Can you elaborate in the comment?
978	I think LastNode is redundant and CurNode can be used in L1016 and L1024.
1501	Prefer incrementing this iterator outside the for loop. It was a little strange to see EI++ in the initialization and increment step.
1597	It would be good to use llvm/Support/GraphWriter.h to export as dotGraph instead of re-implementing things here. You may need to use DOTGraphTraits.h too to implement some things such as tooltips which are used here. This is probably a significant change but it would reduce the amount of code we have to maintain in this file.

snehasish added inline comments.Jan 13 2023, 11:29 AM

llvm/lib/Transforms/IPO/PGHOContextDisambiguation.cpp
775	missing = in the comment, i.e. /TowardsCallee=/ There are a few other cases in this patch. https://clang.llvm.org/extra/clang-tidy/checks/bugprone/argument-comment.html
806	Can we use shared_ptr for the edges so that we don't have to manually track ownership? (also mentioned in the other patch)

Thanks for the comments. I haven't had a chance to address them yet, but most of the suggestions make sense I think. I added just a few responses here.

llvm/include/llvm/ADT/SetOperations.h
60 ↗	(On Diff #486018)	I think it should stay with a value return, not a reference, also to be consistent with other functions in this file that return the result set by value. Also, I do want the name to be different (also to be consistent with what is done elsewhere in this file - see e.g. set_subtract vs set_difference) - I think it is clearer to have a different name for the function that returns the result vs updating the first parameter.
llvm/include/llvm/IR/ModuleSummaryIndex.h
992	I'd prefer not to allocate, based on where and how this is used. Since it is returning a reference, I can't have it return an empty vector like callsites().
llvm/lib/Transforms/IPO/PGHOContextDisambiguation.cpp
2	ack - I was wondering the same thing myself, will do a rename.
123	Yes, it is for inheriting the operators, rather than redefining them. This same trick is used elsewhere in llvm.
806	Yes we could. I have some concern about the additional overhead and whether it is worth it given that tracking these didn't end up being too difficult. I'll collect some measurements, it might be small given the overhead of all the rest of the graph memory.
1213	Will do for commit - as we discussed offline, at this point I may do so after the review which is already in progress here.
1597	Thanks for that pointer, will do. I copied the approach used by ModuleSummaryIndex, which seems like it should eventually be migrated to using the GraphWriter as well!
llvm/test/Transforms/PGHOContextDisambiguation/basic.ll
30	This is the minimum lifetime required in order to mark the context at cold when we do profile matching.

tejohnson mentioned this in D143184: [MemProf] Add helper to access the back (last) call stack id.Feb 2 2023, 7:03 AM

tejohnson added inline comments.Feb 2 2023, 7:07 AM

llvm/include/llvm/Analysis/MemoryProfileInfo.h
131 ↗	(On Diff #486018)	I split this out into D143184, and renamed last() to back() as suggested.
llvm/lib/Transforms/IPO/PGHOContextDisambiguation.cpp
806	I have been playing around with using shared_ptr for the ContextEdges. For some reason, the memory is really blowing up. I expect some increase, but not what I am seeing. I confirmed I am using make_shared which should minimize the overhead, and I don't see where I would be creating extra copies that live somewhere longer than expected, but I feel like I must be doing something wrong. Still looking...

tejohnson mentioned this in rG6827c4f0dea1: [MemProf] Add helper to access the back (last) call stack id.Feb 3 2023, 7:51 AM

tejohnson mentioned this in D144220: New SetOperations and unittesting for all SetOperations.Feb 16 2023, 2:44 PM

tejohnson mentioned this in rGbf0f94a5cf82: New SetOperations and unittesting for all SetOperations.Feb 17 2023, 7:18 AM

tejohnson mentioned this in D144314: [MemProf] Add printing utilities for MemProf summary structures.Feb 17 2023, 6:12 PM

tejohnson mentioned this in D144318: [MemProf] Make hasSingleAllocType helper non-static.Feb 17 2023, 6:48 PM

tejohnson mentioned this in rG5fd82ca05b46: [MemProf] Make hasSingleAllocType helper non-static.Feb 21 2023, 12:00 PM

tejohnson mentioned this in rG200034978b95: [MemProf] Add printing utilities for MemProf summary structures.Feb 22 2023, 6:25 AM

Rebase (to pick up changes refactored out in D144220, D144318, and D144314).

Harbormaster completed remote builds in B215629: Diff 500018.Feb 23 2023, 6:18 PM

I'm about to update the patch with the version that uses shared_ptr for the edges. I'm also using unique_ptr to simplify management of ContextNodes, where I found some leaks happening while debugging my shared_ptr usage. Also made a change to print the context ids in sorted order to avoid spurious test case changes going forward, but that is going to result in some test changes with this update.

Patch not yet ready for re-review, I am working on addressing the other comments while merging in some fixes I made. I also have not yet updated the follow on patches to reflect the shared_ptr changes, will do that later.

llvm/lib/Transforms/IPO/PGHOContextDisambiguation.cpp
806	I found and fixed a bug causing this blow up. The shared_ptr memory seems reasonable.

Use shared_ptr for edges, unique_ptr for nodes, and print context ids in order.

Harbormaster completed remote builds in B215848: Diff 500320.Feb 25 2023, 1:09 AM

Fix typo that prevented matching of inline sequences of more than one inline, and add new tests for the fix.

Harbormaster completed remote builds in B217879: Diff 503040.Mar 7 2023, 8:52 AM

Updating again with another fix from my testing (see description below), along with a new test for it. I will be addressing the remaining comments/suggestions next.

tejohnson added inline comments.Mar 8 2023, 2:18 PM

llvm/lib/Transforms/IPO/PGHOContextDisambiguation.cpp
728	I discovered when testing with a larger app and more graph validation enabled that this early return is not correct. We can have different subsets of duplicate context nodes being propagated along different edges to the same node, and so we were stopping the propagation early. This meant both an insane graph (context ids of node didn't match those of caller and callee edges) but also prevented some cloning. Removing this however leads to long compile time. I redesigned the context duplication and the caller which is updateStackNodes to split this handling into 3: walk the calls and perform all necessary context id duplication, saving the info in a map (in updateStackNodes and a modified duplicateContextIds). propagate the context ids across the full graph in a single pass (in a new propagateDuplicateContextIds called from updateStackNodes). do the post order traversal to generate new nodes for inlined call chains, moving the context ids determined earlier (which might be duplicates) to the new node (in updateStackNodes). This allowed handling quite a few more cold calls in my large application, with smaller compile time than the patch even without the fix to remove this early termination. I'll upload the new version of this handling in a few minutes.

Fix and improve context id duplication, add test that failed node checking without fix.

Harbormaster completed remote builds in B218212: Diff 503519.Mar 8 2023, 4:36 PM

Address most of the remaining comments.

I've now addressed all the feedback except the following 2 items that I will do next, as these will cause more global churn:

Use GraphWriter (I suspect this will cause some tests that check the graph output to need updating).
Rename PGHO* to MemProf*

llvm/lib/Transforms/IPO/PGHOContextDisambiguation.cpp
873	Refactored out and now named assignStackNodesPostOrder
978	We could, but it is clearer IMO to use a name that corresponds to the meaning at those uses below. In any case, this code has been changed and we now assign LastNode more directly.

Harbormaster completed remote builds in B218470: Diff 503874.Mar 9 2023, 2:27 PM

tejohnson marked an inline comment as done.Mar 10 2023, 12:05 PM

Use GraphWriter

About to upload a large but trivial change to replace PGHO/pgho with MemProf/memprof everywhere.

After this change I will make one more update after talking to @snehasish yesterday, by splitting the ThinLTO handling out into a separate patch that depends on this one. Then it is ready for re-review

Replace PGHO/pgho with MemProf/memprof

Harbormaster completed remote builds in B218780: Diff 504294.Mar 10 2023, 5:27 PM

Remove the ThinLTO changes which are being split into another follow on patch.

tejohnson retitled this revision from [MemProf] Context disambiguation cloning pass [patch 1/3] to [MemProf] Context disambiguation cloning pass [patch 1a/3].Mar 10 2023, 6:06 PM

tejohnson edited the summary of this revision. (Show Details)

Removed the ThinLTO changes, and I will upload a patch with just those shortly. This patch is now ready for re-review.

tejohnson added a child revision: D145836: [MemProf] Context disambiguation cloning pass [patch 1b/3].Mar 10 2023, 7:20 PM

Harbormaster completed remote builds in B218797: Diff 504318.Mar 10 2023, 7:47 PM

Still working my way through updateStackNodes. Here are some of the comments I have so far --

For the IR based tests, I noticed some comments which describe how the code was compiled. I wonder if it makes sense to use synthetic IR (derived from the original source) to make the tests more readable and less brittle. I took the indirect.ll test and ran it through llvm-reduce. and found that the reduced IR was much easier to follow. I attached my test.sh script and the reduced IR for indirect.ll. To run the reducer, llvm-reduce --test=test.sh ../path/to/indirect.ll. Let me know what you think.

reduced.ll2 KBDownload

test.sh7 KBDownload

llvm/include/llvm/Transforms/IPO/MemProfContextDisambiguation.h
29 ↗	(On Diff #504318)	I think `\p` is for parameters but here the parameter name is just `M`? https://www.doxygen.nl/manual/commands.html#cmdp
llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
17 ↗	(On Diff #504318)	nit: I don't think we introduce the ThinLTO changes in this patch so maybe update this text in the follow on patches instead?
297 ↗	(On Diff #504318)	Should the StackContext be a reference too? On the other hand, these are iterators so perhaps neither should be since they are lightweight.
312 ↗	(On Diff #504318)	Iterating on the contents of this map (L935) is non-deterministic since we use a pointer as key. It would be nice to avoid non-determinism by using a MapVector even though I don't think there is any externally visible effect. For NodeToCallingFunc below, we don't iterate on the contents so I think that's fine.
350 ↗	(On Diff #504318)	Unclosed parenthesis?
398 ↗	(On Diff #504318)	This method seems to be unused in this diff. If this is used in a subsequent diff then can we move it there? Also for `moveEdgeToExistingCalleeClone` which is only called by this method.
544 ↗	(On Diff #504318)	Can we inline the `{ContextId}` list initializer here directly?
617 ↗	(On Diff #504318)	Are we marking the first AllocNode we encounter as a clone since LastContextId starts from 0 LastContextId is incremented in addStackNodesForMIB (and duplicateContextIds) Comment at L183 indicates that 0 is used to identify a clone A simple fix would be to start LastContextId from 1 in the constructor? I found the update to LastContextId hard to follow but I don't have a suggestion on how this could be made simpler. Also if this is a bug, then maybe a unit test to check the consistency of internal data structures might be useful.
935 ↗	(On Diff #504318)	We can use a structured binding here - `for (auto& [Func, CallsWithMetadata] : FuncToCallsWithMetadata)` Also `auto& Call` below to avoid a copy of the pair.
965 ↗	(On Diff #504318)	How about `auto& Ids = std::get<1>(Calls[0]);` since we don't need to unpack the other elements (in this if block)? Same for functor passed to the std::sort function below.
1163 ↗	(On Diff #504318)	Can we check if I isa<CallInst> (and invoke etc) before querying for metadata? I think this is simpler but a type check might speed things up by avoiding linear metadata attachment scans[1]. [1] https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/Metadata.cpp#L1239
1167 ↗	(On Diff #504318)	Should we assert `I.getMetadata(LLVMContext::MD_callsite)` is not null?
1177 ↗	(On Diff #504318)	nit: we can move this bookkeeping to `addAllocNode` (similar to how we update `AllocationCallToContextNodeMap` in `addAllocNode`).
1190 ↗	(On Diff #504318)	I think we should move the logic from L1190 to L1206 (end of the constructor) to the process method. It was a bit surprising for me to the process method not actually do any processing .. :) Also this would bring all the dump and export calls in one place to follow sequentially.
1200 ↗	(On Diff #504318)	It would be nice to move this cleanup to the end even though it does not affect `handleCallsitesWithMultipleTargets`.
1202 ↗	(On Diff #504318)	auto& to avoid a copy?
1257 ↗	(On Diff #504318)	nit: This can be auto* too like the one below.
1355 ↗	(On Diff #504318)	Now that we have GraphTraits, can we use the graph iterators instead of our own DFS here?
1378 ↗	(On Diff #504318)	I don't see CheckEdges set to true in this diff. I think we should remove it altogether or make it always check by default if we are calling verify. What do you think?
1429 ↗	(On Diff #504318)	Now that we have GraphTraits, can we use the graph iterators instead of our own DFS here?
1443 ↗	(On Diff #504318)	s/GetNode/getNode to conform to llvm style. I think it's only used in the nodes_* functions below so we should be good. Same for GetCallee below.
1484 ↗	(On Diff #504318)	s/isSimple/IsSimple?
1563 ↗	(On Diff #504318)	We can omit the "else" part of "else if" after the return statements.
1574 ↗	(On Diff #504318)	sstream and result variable names should be InitCaps?
llvm/test/Transforms/MemProfContextDisambiguation/duplicate-context-ids2.ll
396 ↗	(On Diff #504318)	I don't think the Thin-LTO summaries are used in this patch. Not opposed to leaving it in for the next patch. Just curious whether this was intentional.

Address all comments other than the test reduction and graph traversal suggestions, which I will work on next.

In D140908#4198024, @snehasish wrote:

Still working my way through updateStackNodes. Here are some of the comments I have so far --

For the IR based tests, I noticed some comments which describe how the code was compiled. I wonder if it makes sense to use synthetic IR (derived from the original source) to make the tests more readable and less brittle. I took the indirect.ll test and ran it through llvm-reduce. and found that the reduced IR was much easier to follow. I attached my test.sh script and the reduced IR for indirect.ll. To run the reducer, llvm-reduce --test=test.sh ../path/to/indirect.ll. Let me know what you think.

This is a nice idea. Will do so for the tests but probably do that after addressing the other comments in case there is any more churn in the expected test output, and since it is more mechanical.

llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
17 ↗	(On Diff #504318)	Instead of removing the ThinLTO comment, I added "(eventually)" in front of it. Otherwise the CRTP design choice doesn't make sense.
312 ↗	(On Diff #504318)	Actually, the only time we index as a map is while building, and we don't really need to do this since we process each function once. I can simply change this to a vector of pairs.
617 ↗	(On Diff #504318)	The first AllocNode indeed will have 0 value here. However, that isn't a bug or in conflict with the comment, although I could see how it might be confusing. For alloc nodes we simply use LastContextId to give them unique ids that are only used in dot dumping (see the first sentence in the comment for OrigStackOrAllocId. For stack nodes this is the stack id corresponding to the node (i.e. from the metadata). The comment is just noting that clones will retain a 0 value for this field, but that it isn't an issue since we only use it during matching of callsite metadata (when the graph is being built). I rewrote the comment on L183, hopefully that clarifies things. The reason why we increment LastContextId before using it in addStackNodesForMIB is that that is where we are using it for its main purpose - to assign a unique (non-zero) id to each allocation context from memprof metadata.
1190 ↗	(On Diff #504318)	I would prefer not to do this, since updateStackNodes called below is part of the graph building and should be here in the graph constructor. Note that process() is currently empty only because the graph transformations were split into patches 2 and 3.
1378 ↗	(On Diff #504318)	It's used in follow on patches, when we are checking nodes in isolation, i.e. not checking the whole graph as we do here via check()/checkNodesRecursively() (the latter ensures we check each edge exactly once). I'll remove from this patch.

Address all comments other than the test reduction and graph traversal suggestions, which I will work on next.

lgtm, overall. The remaining comments are minor and thanks for the explanations in the tests.

llvm/include/llvm/Transforms/IPO/MemProfContextDisambiguation.h
22 ↗	(On Diff #506688)	This header seems unused in this diff?
llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
17 ↗	(On Diff #504318)	Yes, that sounds good to me.
617 ↗	(On Diff #504318)	Thanks for the explanation. It makes sense to me with the updated comment.
1014 ↗	(On Diff #504318)	"after the last" sounds confusing, perhaps callout we are iterating backwards?
1129 ↗	(On Diff #504318)	nit: Use Twine for concat?
1190 ↗	(On Diff #504318)	Got it, as I worked my way through updateStackNodes I realized that it's still just set up.
1494 ↗	(On Diff #504318)	std::to_string(int) can be replaced with Twine(int) directly to avoid redundant conversions. Also in other locations apart from this one. https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/Twine.h#L347
1513 ↗	(On Diff #504318)	Maybe just declare AttributeString as Twine and change the return statement to `return AttributeString.str();` ? Similar pattern of usage below too if choose to update this one.
llvm/test/Transforms/MemProfContextDisambiguation/inlined2.ll
56 ↗	(On Diff #504318)	I guess once we reduce the tests, the heapallocsite metadata will be dropped. On a somewhat related note, should we update the code below (in a separate patch) to drop memprof metadata too? https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/DebugInfo.cpp#L852
9 ↗	(On Diff #506688)	Should this have trailing underscores too? i.e `__attribute__((noilnine))`

This revision is now accepted and ready to land.Mar 20 2023, 2:31 PM

Harbormaster completed remote builds in B220521: Diff 506688.Mar 20 2023, 3:16 PM

Address the remaining minor suggestions (still need to update the graph walks and reduce the test cases).

llvm/test/Transforms/MemProfContextDisambiguation/inlined2.ll
56 ↗	(On Diff #504318)	I guess once we reduce the tests, the heapallocsite metadata will be dropped. Yeah, I manually removed this metadata from most of the tests, looks like I missed this one. Won't update right now since this will get stripped when I reduce the test cases. On a somewhat related note, should we update the code below (in a separate patch) to drop memprof metadata too? Maybe? Unlike the heapallocsite metadata, the memprof metadata doesn't become invalid when the non-line table debug info is removed (it doesn't directly reference it). So it might depend on the intent of this code - do you know when it is invoked? I looked a bit but it wasn't clear to me.

Address the remaining minor suggestions (still need to update the graph walks and reduce the test cases).

snehasish added inline comments.Mar 21 2023, 10:02 AM

llvm/test/Transforms/MemProfContextDisambiguation/inlined2.ll
56 ↗	(On Diff #504318)	From the commit message for the pass [1] it looks like the primary usage is to trim bitcode before embedding. The other use case is for bugpoint related reduction which might be useful for us. doesn't become invalid Yes, leaving it in is benign and its very unlikely that a memprof annotated bitcode will end up in a place where this is a concern. Up to you if you want to take any action on this. [1] https://github.com/llvm/llvm-project/commit/e542804343ac52f40b2310eff4b3f1f045d5e5f4

tejohnson added inline comments.Mar 21 2023, 10:40 AM

llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
1513 ↗	(On Diff #504318)	Actually this didn't work here, since Twine's operator= is deleted, and getNodeAttributes requires conditional updates to the string. I was able to modify getEdgeAttribute below to use a single Twine, however. I also had to change getContextIds back to using std::string for similar reasons, but updated it to use Twine instead of std:to_string.

Fix Twine usage

tejohnson marked 2 inline comments as done.Mar 21 2023, 12:06 PM

tejohnson added inline comments.

llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
1355 ↗	(On Diff #504318)	For both this and check() below, we don't need or want DFS. The problem with DFS is that it seems to expect a single root entry node, which we don't have. But neither of these traversals need to be in any specific order. So I have simply replaced the recursive walk with using the nodes() range iterator provided by GraphTraits.h. This is the same iteration done by the graph writer. There are a few minor test changes.

Use GraphTraits to iterate over the graph when printing and checking.

Reduced the test cases

Harbormaster completed remote builds in B220830: Diff 507115.Mar 21 2023, 4:31 PM

Closed by commit rGd6ad4f01c3da: [MemProf] Context disambiguation cloning pass [patch 1a/3] (authored by tejohnson). · Explain WhyMar 22 2023, 7:05 AM

This revision was automatically updated to reflect the committed changes.

tejohnson added a commit: rGd6ad4f01c3da: [MemProf] Context disambiguation cloning pass [patch 1a/3].

nikic added a reverting change: rG883dbb9c86be: Revert "[MemProf] Context disambiguation cloning pass [patch 1a/3]".Mar 22 2023, 7:45 AM

tejohnson added a commit: rG700cd99061ed: Restore "[MemProf] Context disambiguation cloning pass [patch 1a/3]".Mar 22 2023, 10:16 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

ModuleSummaryIndex.h

10 lines

Transforms/

IPO/

PGHOContextDisambiguation.h

45 lines

lib/

LTO/

LTO.cpp

16 lines

Passes/

PassBuilder.cpp

1 line

PassBuilderPipelines.cpp

11 lines

PassRegistry.def

1 line

Transforms/

IPO/

CMakeLists.txt

1 line

PGHOContextDisambiguation.cpp

1774 lines

test/

ThinLTO/

X86/

pgho-basic.ll

211 lines

pgho-duplicate-context-ids.ll

311 lines

pgho-indirectcall.ll

449 lines

pgho-inlined.ll

244 lines

Transforms/

PGHOContextDisambiguation/

basic.ll

203 lines

duplicate-context-ids.ll

300 lines

indirectcall.ll

435 lines

inlined.ll

233 lines

pass-pipeline.ll

41 lines

Diff 500320

llvm/include/llvm/IR/ModuleSummaryIndex.h

Show First 20 Lines • Show All 309 Lines • ▼ Show 20 Lines	struct CallsiteInfo {
CallsiteInfo(ValueInfo Callee, SmallVector<unsigned> StackIdIndices)		CallsiteInfo(ValueInfo Callee, SmallVector<unsigned> StackIdIndices)
: Callee(Callee), StackIdIndices(std::move(StackIdIndices)) {}		: Callee(Callee), StackIdIndices(std::move(StackIdIndices)) {}
CallsiteInfo(ValueInfo Callee, SmallVector<unsigned> Clones,		CallsiteInfo(ValueInfo Callee, SmallVector<unsigned> Clones,
SmallVector<unsigned> StackIdIndices)		SmallVector<unsigned> StackIdIndices)
: Callee(Callee), Clones(std::move(Clones)),		: Callee(Callee), Clones(std::move(Clones)),
StackIdIndices(std::move(StackIdIndices)) {}		StackIdIndices(std::move(StackIdIndices)) {}
};		};

inline raw_ostream &operator<<(raw_ostream &OS, const CallsiteInfo &SNI) {		inline raw_ostream &operator<<(raw_ostream &OS, const CallsiteInfo &SNI) {
		davidxlUnsubmitted Done Reply Inline Actions dumping utilities can be split out. Similarly for new interfaces like 'last', or set_subtract . davidxl: dumping utilities can be split out. Similarly for new interfaces like 'last', or set_subtract .
		snehasishUnsubmitted Done Reply Inline Actions +1 snehasish: +1
OS << "Callee: " << SNI.Callee;		OS << "Callee: " << SNI.Callee;
bool First = true;		bool First = true;
OS << " Clones: ";		OS << " Clones: ";
for (auto V : SNI.Clones) {		for (auto V : SNI.Clones) {
if (!First)		if (!First)
OS << ", ";		OS << ", ";
First = false;		First = false;
OS << V;		OS << V;
▲ Show 20 Lines • Show All 656 Lines • ▼ Show 20 Lines	public:
const TypeIdInfo *getTypeIdInfo() const { return TIdInfo.get(); };		const TypeIdInfo *getTypeIdInfo() const { return TIdInfo.get(); };

ArrayRef<CallsiteInfo> callsites() const {		ArrayRef<CallsiteInfo> callsites() const {
if (Callsites)		if (Callsites)
return *Callsites;		return *Callsites;
return {};		return {};
}		}

		CallsitesTy &mutableCallsites() {
		assert(Callsites);
		snehasishUnsubmitted Not Done Reply Inline Actions The behaviour of the const and mutable version of the callsites function differ. In `callsites` if its null it returns an empty array ref, for mutableCallsites it may assert (or segfault on an opt build). Should we avoid the assert by allocating storage instead? snehasish: The behaviour of the const and mutable version of the callsites function differ. In `callsites`…
		tejohnsonAuthorUnsubmitted Done Reply Inline Actions I'd prefer not to allocate, based on where and how this is used. Since it is returning a reference, I can't have it return an empty vector like callsites(). tejohnson: I'd prefer not to allocate, based on where and how this is used. Since it is returning a…
		return *Callsites;
		}

ArrayRef<AllocInfo> allocs() const {		ArrayRef<AllocInfo> allocs() const {
if (Allocs)		if (Allocs)
return *Allocs;		return *Allocs;
return {};		return {};
}		}

		AllocsTy &mutableAllocs() {
		assert(Allocs);
		return *Allocs;
		}

friend struct GraphTraits<ValueInfo>;		friend struct GraphTraits<ValueInfo>;
};		};

template <> struct DenseMapInfo<FunctionSummary::VFuncId> {		template <> struct DenseMapInfo<FunctionSummary::VFuncId> {
static FunctionSummary::VFuncId getEmptyKey() { return {0, uint64_t(-1)}; }		static FunctionSummary::VFuncId getEmptyKey() { return {0, uint64_t(-1)}; }

static FunctionSummary::VFuncId getTombstoneKey() {		static FunctionSummary::VFuncId getTombstoneKey() {
return {0, uint64_t(-2)};		return {0, uint64_t(-2)};
▲ Show 20 Lines • Show All 868 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/IPO/PGHOContextDisambiguation.h

This file was added.

				//====- PGHOContextDisambiguation.h - Context Disambiguation ----- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Implements support for context disambiguation of allocation calls for profile
				// guided heap optimization. See implementation file for details.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_IPO_PGHO_CONTEXT_DISAMBIGUATION_H
				#define LLVM_TRANSFORMS_IPO_PGHO_CONTEXT_DISAMBIGUATION_H

				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/StringSet.h"
				#include "llvm/IR/GlobalValue.h"
				#include "llvm/IR/PassManager.h"
				#include <functional>

				namespace llvm {
				class GlobalValueSummary;
				class Module;
				class ModuleSummaryIndex;

				class PGHOContextDisambiguation
				: public PassInfoMixin<PGHOContextDisambiguation> {
				/// Run the context disambiguator on \p TheModule, returns true if any changes
				/// was made.
				bool processModule(Module &M);

				public:
				PGHOContextDisambiguation() {}

				PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);

				void run(ModuleSummaryIndex &Index,
				function_ref<bool(GlobalValue::GUID, const GlobalValueSummary *)>
				isPrevailing);
				};
				} // end namespace llvm

				#endif // LLVM_TRANSFORMS_IPO_PGHO_CONTEXT_DISAMBIGUATION_H

llvm/lib/LTO/LTO.cpp

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
#include "llvm/Support/ThreadPool.h"		#include "llvm/Support/ThreadPool.h"
#include "llvm/Support/Threading.h"		#include "llvm/Support/Threading.h"
#include "llvm/Support/TimeProfiler.h"		#include "llvm/Support/TimeProfiler.h"
#include "llvm/Support/ToolOutputFile.h"		#include "llvm/Support/ToolOutputFile.h"
#include "llvm/Support/VCSRevision.h"		#include "llvm/Support/VCSRevision.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetOptions.h"		#include "llvm/Target/TargetOptions.h"
#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
		#include "llvm/Transforms/IPO/PGHOContextDisambiguation.h"
#include "llvm/Transforms/IPO/WholeProgramDevirt.h"		#include "llvm/Transforms/IPO/WholeProgramDevirt.h"
#include "llvm/Transforms/Utils/FunctionImportUtils.h"		#include "llvm/Transforms/Utils/FunctionImportUtils.h"
#include "llvm/Transforms/Utils/SplitModule.h"		#include "llvm/Transforms/Utils/SplitModule.h"

#include <optional>		#include <optional>
#include <set>		#include <set>

using namespace llvm;		using namespace llvm;
using namespace lto;		using namespace lto;
using namespace object;		using namespace object;

#define DEBUG_TYPE "lto"		#define DEBUG_TYPE "lto"

static cl::opt<bool>		static cl::opt<bool>
DumpThinCGSCCs("dump-thin-cg-sccs", cl::init(false), cl::Hidden,		DumpThinCGSCCs("dump-thin-cg-sccs", cl::init(false), cl::Hidden,
cl::desc("Dump the SCCs in the ThinLTO index's callgraph"));		cl::desc("Dump the SCCs in the ThinLTO index's callgraph"));

namespace llvm {		namespace llvm {
/// Enable global value internalization in LTO.		/// Enable global value internalization in LTO.
cl::opt<bool> EnableLTOInternalization(		cl::opt<bool> EnableLTOInternalization(
"enable-lto-internalization", cl::init(true), cl::Hidden,		"enable-lto-internalization", cl::init(true), cl::Hidden,
cl::desc("Enable global value internalization in LTO"));		cl::desc("Enable global value internalization in LTO"));
}		}

		/// Enable PGHO context disambiguation for thin link.
		extern cl::opt<bool> EnablePGHOContextDisambiguation;

// Computes a unique hash for the Module considering the current list of		// Computes a unique hash for the Module considering the current list of
// export/import and other global analysis results.		// export/import and other global analysis results.
// The hash is produced in \p Key.		// The hash is produced in \p Key.
void llvm::computeLTOCacheKey(		void llvm::computeLTOCacheKey(
SmallString<40> &Key, const Config &Conf, const ModuleSummaryIndex &Index,		SmallString<40> &Key, const Config &Conf, const ModuleSummaryIndex &Index,
StringRef ModuleID, const FunctionImporter::ImportMapTy &ImportList,		StringRef ModuleID, const FunctionImporter::ImportMapTy &ImportList,
const FunctionImporter::ExportSetTy &ExportList,		const FunctionImporter::ExportSetTy &ExportList,
const std::map<GlobalValue::GUID, GlobalValue::LinkageTypes> &ResolvedODR,		const std::map<GlobalValue::GUID, GlobalValue::LinkageTypes> &ResolvedODR,
▲ Show 20 Lines • Show All 1,445 Lines • ▼ Show 20 Lines	Error LTO::runThinLTO(AddStreamFn AddStream, FileCache Cache,

// Perform index-based WPD. This will return immediately if there are		// Perform index-based WPD. This will return immediately if there are
// no index entries in the typeIdMetadata map (e.g. if we are instead		// no index entries in the typeIdMetadata map (e.g. if we are instead
// performing IR-based WPD in hybrid regular/thin LTO mode).		// performing IR-based WPD in hybrid regular/thin LTO mode).
std::map<ValueInfo, std::vector<VTableSlotSummary>> LocalWPDTargetsMap;		std::map<ValueInfo, std::vector<VTableSlotSummary>> LocalWPDTargetsMap;
runWholeProgramDevirtOnIndex(ThinLTO.CombinedIndex, ExportedGUIDs,		runWholeProgramDevirtOnIndex(ThinLTO.CombinedIndex, ExportedGUIDs,
LocalWPDTargetsMap);		LocalWPDTargetsMap);

		auto isPrevailing = [&](GlobalValue::GUID GUID, const GlobalValueSummary *S) {
		return ThinLTO.PrevailingModuleForGUID[GUID] == S->modulePath();
		};
		if (EnablePGHOContextDisambiguation) {
		PGHOContextDisambiguation ContextDisambiguation;
		ContextDisambiguation.run(ThinLTO.CombinedIndex, isPrevailing);
		}

if (Conf.OptLevel > 0)		if (Conf.OptLevel > 0)
ComputeCrossModuleImport(ThinLTO.CombinedIndex, ModuleToDefinedGVSummaries,		ComputeCrossModuleImport(ThinLTO.CombinedIndex, ModuleToDefinedGVSummaries,
ImportLists, ExportLists);		ImportLists, ExportLists);

// Figure out which symbols need to be internalized. This also needs to happen		// Figure out which symbols need to be internalized. This also needs to happen
// at -O0 because summary-based DCE is implemented using internalization, and		// at -O0 because summary-based DCE is implemented using internalization, and
// we must apply DCE consistently with the full LTO module in order to avoid		// we must apply DCE consistently with the full LTO module in order to avoid
// undefined references during the final link.		// undefined references during the final link.
Show All 25 Lines	return (ExportList != ExportLists.end() && ExportList->second.count(VI)) \|\|
ExportedGUIDs.count(VI.getGUID());		ExportedGUIDs.count(VI.getGUID());
};		};

// Update local devirtualized targets that were exported by cross-module		// Update local devirtualized targets that were exported by cross-module
// importing or by other devirtualizations marked in the ExportedGUIDs set.		// importing or by other devirtualizations marked in the ExportedGUIDs set.
updateIndexWPDForExports(ThinLTO.CombinedIndex, isExported,		updateIndexWPDForExports(ThinLTO.CombinedIndex, isExported,
LocalWPDTargetsMap);		LocalWPDTargetsMap);

auto isPrevailing = [&](GlobalValue::GUID GUID,
const GlobalValueSummary *S) {
return ThinLTO.PrevailingModuleForGUID[GUID] == S->modulePath();
};
thinLTOInternalizeAndPromoteInIndex(ThinLTO.CombinedIndex, isExported,		thinLTOInternalizeAndPromoteInIndex(ThinLTO.CombinedIndex, isExported,
isPrevailing);		isPrevailing);

auto recordNewLinkage = [&](StringRef ModuleIdentifier,		auto recordNewLinkage = [&](StringRef ModuleIdentifier,
GlobalValue::GUID GUID,		GlobalValue::GUID GUID,
GlobalValue::LinkageTypes NewLinkage) {		GlobalValue::LinkageTypes NewLinkage) {
ResolvedODR[ModuleIdentifier][GUID] = NewLinkage;		ResolvedODR[ModuleIdentifier][GUID] = NewLinkage;
};		};
▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

llvm/lib/Passes/PassBuilder.cpp

	Show First 20 Lines • Show All 114 Lines • ▼ Show 20 Lines
	#include "llvm/Transforms/IPO/InferFunctionAttrs.h"			#include "llvm/Transforms/IPO/InferFunctionAttrs.h"
	#include "llvm/Transforms/IPO/Inliner.h"			#include "llvm/Transforms/IPO/Inliner.h"
	#include "llvm/Transforms/IPO/Internalize.h"			#include "llvm/Transforms/IPO/Internalize.h"
	#include "llvm/Transforms/IPO/LoopExtractor.h"			#include "llvm/Transforms/IPO/LoopExtractor.h"
	#include "llvm/Transforms/IPO/LowerTypeTests.h"			#include "llvm/Transforms/IPO/LowerTypeTests.h"
	#include "llvm/Transforms/IPO/MergeFunctions.h"			#include "llvm/Transforms/IPO/MergeFunctions.h"
	#include "llvm/Transforms/IPO/ModuleInliner.h"			#include "llvm/Transforms/IPO/ModuleInliner.h"
	#include "llvm/Transforms/IPO/OpenMPOpt.h"			#include "llvm/Transforms/IPO/OpenMPOpt.h"
				#include "llvm/Transforms/IPO/PGHOContextDisambiguation.h"
	#include "llvm/Transforms/IPO/PartialInlining.h"			#include "llvm/Transforms/IPO/PartialInlining.h"
	#include "llvm/Transforms/IPO/SCCP.h"			#include "llvm/Transforms/IPO/SCCP.h"
	#include "llvm/Transforms/IPO/SampleProfile.h"			#include "llvm/Transforms/IPO/SampleProfile.h"
	#include "llvm/Transforms/IPO/SampleProfileProbe.h"			#include "llvm/Transforms/IPO/SampleProfileProbe.h"
	#include "llvm/Transforms/IPO/StripDeadPrototypes.h"			#include "llvm/Transforms/IPO/StripDeadPrototypes.h"
	#include "llvm/Transforms/IPO/StripSymbols.h"			#include "llvm/Transforms/IPO/StripSymbols.h"
	#include "llvm/Transforms/IPO/SyntheticCountsPropagation.h"			#include "llvm/Transforms/IPO/SyntheticCountsPropagation.h"
	#include "llvm/Transforms/IPO/WholeProgramDevirt.h"			#include "llvm/Transforms/IPO/WholeProgramDevirt.h"
	▲ Show 20 Lines • Show All 1,837 Lines • Show Last 20 Lines

llvm/lib/Passes/PassBuilderPipelines.cpp

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
#include "llvm/Transforms/IPO/HotColdSplitting.h"		#include "llvm/Transforms/IPO/HotColdSplitting.h"
#include "llvm/Transforms/IPO/IROutliner.h"		#include "llvm/Transforms/IPO/IROutliner.h"
#include "llvm/Transforms/IPO/InferFunctionAttrs.h"		#include "llvm/Transforms/IPO/InferFunctionAttrs.h"
#include "llvm/Transforms/IPO/Inliner.h"		#include "llvm/Transforms/IPO/Inliner.h"
#include "llvm/Transforms/IPO/LowerTypeTests.h"		#include "llvm/Transforms/IPO/LowerTypeTests.h"
#include "llvm/Transforms/IPO/MergeFunctions.h"		#include "llvm/Transforms/IPO/MergeFunctions.h"
#include "llvm/Transforms/IPO/ModuleInliner.h"		#include "llvm/Transforms/IPO/ModuleInliner.h"
#include "llvm/Transforms/IPO/OpenMPOpt.h"		#include "llvm/Transforms/IPO/OpenMPOpt.h"
		#include "llvm/Transforms/IPO/PGHOContextDisambiguation.h"
#include "llvm/Transforms/IPO/PartialInlining.h"		#include "llvm/Transforms/IPO/PartialInlining.h"
#include "llvm/Transforms/IPO/SCCP.h"		#include "llvm/Transforms/IPO/SCCP.h"
#include "llvm/Transforms/IPO/SampleProfile.h"		#include "llvm/Transforms/IPO/SampleProfile.h"
#include "llvm/Transforms/IPO/SampleProfileProbe.h"		#include "llvm/Transforms/IPO/SampleProfileProbe.h"
#include "llvm/Transforms/IPO/SyntheticCountsPropagation.h"		#include "llvm/Transforms/IPO/SyntheticCountsPropagation.h"
#include "llvm/Transforms/IPO/WholeProgramDevirt.h"		#include "llvm/Transforms/IPO/WholeProgramDevirt.h"
#include "llvm/Transforms/InstCombine/InstCombine.h"		#include "llvm/Transforms/InstCombine/InstCombine.h"
#include "llvm/Transforms/Instrumentation/CGProfile.h"		#include "llvm/Transforms/Instrumentation/CGProfile.h"
▲ Show 20 Lines • Show All 200 Lines • ▼ Show 20 Lines	cl::values(clEnumValN(AttributorRunOption::ALL, "all",
"enable all attributor runs"),		"enable all attributor runs"),
clEnumValN(AttributorRunOption::MODULE, "module",		clEnumValN(AttributorRunOption::MODULE, "module",
"enable module-wide attributor runs"),		"enable module-wide attributor runs"),
clEnumValN(AttributorRunOption::CGSCC, "cgscc",		clEnumValN(AttributorRunOption::CGSCC, "cgscc",
"enable call graph SCC attributor runs"),		"enable call graph SCC attributor runs"),
clEnumValN(AttributorRunOption::NONE, "none",		clEnumValN(AttributorRunOption::NONE, "none",
"disable attributor runs")));		"disable attributor runs")));

		cl::opt<bool> EnablePGHOContextDisambiguation(
		"enable-pgho-context-disambiguation", cl::init(false), cl::Hidden,
		cl::ZeroOrMore, cl::desc("Enable PGHO context disambiguation"));

PipelineTuningOptions::PipelineTuningOptions() {		PipelineTuningOptions::PipelineTuningOptions() {
LoopInterleaving = true;		LoopInterleaving = true;
LoopVectorization = true;		LoopVectorization = true;
SLPVectorization = false;		SLPVectorization = false;
LoopUnrolling = true;		LoopUnrolling = true;
ForgetAllSCEVInLoopUnroll = ForgetSCEVInLoopUnroll;		ForgetAllSCEVInLoopUnroll = ForgetSCEVInLoopUnroll;
LicmMssaOptCap = SetLicmMssaOptCap;		LicmMssaOptCap = SetLicmMssaOptCap;
LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap;		LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap;
▲ Show 20 Lines • Show All 1,421 Lines • ▼ Show 20 Lines	PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
// invoke or a call.		// invoke or a call.
// Run the inliner now.		// Run the inliner now.
MPM.addPass(ModuleInlinerWrapperPass(		MPM.addPass(ModuleInlinerWrapperPass(
getInlineParamsFromOptLevel(Level),		getInlineParamsFromOptLevel(Level),
/* MandatoryFirst */ true,		/* MandatoryFirst */ true,
InlineContext{ThinOrFullLTOPhase::FullLTOPostLink,		InlineContext{ThinOrFullLTOPhase::FullLTOPostLink,
InlinePass::CGSCCInliner}));		InlinePass::CGSCCInliner}));

		// Perform context disambiguation after inlining, since that would reduce the
		// amount of additional cloning required to distinguish the allocation
		// contexts.
		if (EnablePGHOContextDisambiguation)
		MPM.addPass(PGHOContextDisambiguation());

// Optimize globals again after we ran the inliner.		// Optimize globals again after we ran the inliner.
MPM.addPass(GlobalOptPass());		MPM.addPass(GlobalOptPass());

// Run the OpenMPOpt pass again after global optimizations.		// Run the OpenMPOpt pass again after global optimizations.
MPM.addPass(OpenMPOptPass(ThinOrFullLTOPhase::FullLTOPostLink));		MPM.addPass(OpenMPOptPass(ThinOrFullLTOPhase::FullLTOPostLink));

// Garbage collect dead functions.		// Garbage collect dead functions.
MPM.addPass(GlobalDCEPass());		MPM.addPass(GlobalDCEPass());
▲ Show 20 Lines • Show All 280 Lines • Show Last 20 Lines

llvm/lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	MODULE_PASS("lower-ifunc", LowerIFuncPass())			MODULE_PASS("lower-ifunc", LowerIFuncPass())
	MODULE_PASS("lowertypetests", LowerTypeTestsPass())			MODULE_PASS("lowertypetests", LowerTypeTestsPass())
	MODULE_PASS("metarenamer", MetaRenamerPass())			MODULE_PASS("metarenamer", MetaRenamerPass())
	MODULE_PASS("mergefunc", MergeFunctionsPass())			MODULE_PASS("mergefunc", MergeFunctionsPass())
	MODULE_PASS("name-anon-globals", NameAnonGlobalPass())			MODULE_PASS("name-anon-globals", NameAnonGlobalPass())
	MODULE_PASS("no-op-module", NoOpModulePass())			MODULE_PASS("no-op-module", NoOpModulePass())
	MODULE_PASS("objc-arc-apelim", ObjCARCAPElimPass())			MODULE_PASS("objc-arc-apelim", ObjCARCAPElimPass())
	MODULE_PASS("partial-inliner", PartialInlinerPass())			MODULE_PASS("partial-inliner", PartialInlinerPass())
				MODULE_PASS("pgho-context-disambiguation", PGHOContextDisambiguation())
	MODULE_PASS("pgo-icall-prom", PGOIndirectCallPromotion())			MODULE_PASS("pgo-icall-prom", PGOIndirectCallPromotion())
	MODULE_PASS("pgo-instr-gen", PGOInstrumentationGen())			MODULE_PASS("pgo-instr-gen", PGOInstrumentationGen())
	MODULE_PASS("pgo-instr-use", PGOInstrumentationUse())			MODULE_PASS("pgo-instr-use", PGOInstrumentationUse())
	MODULE_PASS("print-profile-summary", ProfileSummaryPrinterPass(dbgs()))			MODULE_PASS("print-profile-summary", ProfileSummaryPrinterPass(dbgs()))
	MODULE_PASS("print-callgraph", CallGraphPrinterPass(dbgs()))			MODULE_PASS("print-callgraph", CallGraphPrinterPass(dbgs()))
	MODULE_PASS("print-callgraph-sccs", CallGraphSCCsPrinterPass(dbgs()))			MODULE_PASS("print-callgraph-sccs", CallGraphSCCsPrinterPass(dbgs()))
	MODULE_PASS("print", PrintModulePass(dbgs()))			MODULE_PASS("print", PrintModulePass(dbgs()))
	MODULE_PASS("print-lcg", LazyCallGraphPrinterPass(dbgs()))			MODULE_PASS("print-lcg", LazyCallGraphPrinterPass(dbgs()))
	▲ Show 20 Lines • Show All 509 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/CMakeLists.txt

Show All 26 Lines	add_llvm_component_library(LLVMipo
Internalize.cpp		Internalize.cpp
LoopExtractor.cpp		LoopExtractor.cpp
LowerTypeTests.cpp		LowerTypeTests.cpp
MergeFunctions.cpp		MergeFunctions.cpp
ModuleInliner.cpp		ModuleInliner.cpp
OpenMPOpt.cpp		OpenMPOpt.cpp
PartialInlining.cpp		PartialInlining.cpp
PassManagerBuilder.cpp		PassManagerBuilder.cpp
		PGHOContextDisambiguation.cpp
SampleContextTracker.cpp		SampleContextTracker.cpp
SampleProfile.cpp		SampleProfile.cpp
SampleProfileProbe.cpp		SampleProfileProbe.cpp
SCCP.cpp		SCCP.cpp
StripDeadPrototypes.cpp		StripDeadPrototypes.cpp
StripSymbols.cpp		StripSymbols.cpp
SyntheticCountsPropagation.cpp		SyntheticCountsPropagation.cpp
ThinLTOBitcodeWriter.cpp		ThinLTOBitcodeWriter.cpp
Show All 32 Lines

llvm/lib/Transforms/IPO/PGHOContextDisambiguation.cpp

This file was added.

				//===-- PGHOContextDisambiguation.cpp - Disambiguate contexts -------------===//
				//
				snehasishUnsubmitted Done Reply Inline Actions So far we've used memprof to refer to all of the prior work, I would strongly prefer we continue using the same to make it easy to discover all the related code by just searching for a single term. This is also useful for listing all the options related to memprof which start with the "memprof-" prefix. Wdyt? snehasish: So far we've used memprof to refer to all of the prior work, I would strongly prefer we…
				tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions ack - I was wondering the same thing myself, will do a rename. tejohnson: ack - I was wondering the same thing myself, will do a rename.
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements support for context disambiguation of allocation
				// calls for profile guided heap optimization. Specifically, it uses Memprof
				// profiles which indicate context specific allocation behavior (currently
				// distinguishing cold vs hot memory allocations). Cloning is performed to
				// expose the cold allocation call contexts, and the allocation calls are
				// subsequently annotated with an attribute for later transformation.
				//
				// The transformations can be performed either directly on IR (regular LTO), or
				// on a ThinLTO index (and later applied to the IR during the ThinLTO backend).
				// Both types of LTO operate on a the same base graph representation, which
				// uses CRTP to support either IR or Index formats.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/IPO/PGHOContextDisambiguation.h"
				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/DenseSet.h"
				#include "llvm/ADT/SetOperations.h"
				#include "llvm/ADT/SmallPtrSet.h"
				#include "llvm/ADT/SmallSet.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/Analysis/MemoryProfileInfo.h"
				#include "llvm/Analysis/ModuleSummaryAnalysis.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/Module.h"
				#include "llvm/IR/ModuleSummaryIndex.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/FileSystem.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Transforms/IPO.h"
				#include <sstream>
				#include <vector>
				using namespace llvm;
				using namespace llvm::memprof;

				#define DEBUG_TYPE "pgho-context-disambiguation"


				static cl::opt<std::string> DotFilePathPrefix(
				"pgho-dot-file-path-prefix", cl::init(""), cl::Hidden,
				cl::value_desc("filename"),
				cl::desc("Specify the path prefix of the PGHO dot files."));

				static cl::opt<bool> ExportToDot("pgho-export-to-dot", cl::init(false),
				cl::Hidden,
				cl::desc("Export graph to dot files."));

				static cl::opt<bool> DumpCCG("pgho-dump-ccg", cl::init(false), cl::Hidden,
				cl::desc("Dump CCG to stdout after each stage."));

				snehasishUnsubmitted Done Reply Inline Actions nit: Expand CCG to CallingContextGraph here and below. snehasish: nit: Expand CCG to CallingContextGraph here and below.
				static cl::opt<bool> VerifyCCG("pgho-verify-ccg", cl::init(false), cl::Hidden,
				cl::desc("Perform verification checks on CCG."));

				static cl::opt<bool>
				VerifyNodes("pgho-verify-nodes", cl::init(false), cl::Hidden,
				cl::desc("Perform frequent verification checks on nodes."));

				inline bool hasSingleAllocType(uint8_t AllocTypes) {
				switch (AllocTypes) {
				snehasishUnsubmitted Done Reply Inline Actions Can we move this to MemoryProfileInfo.h and use it here as well as MemoryProfileInfo.cpp where it exists as a static function? snehasish: Can we move this to MemoryProfileInfo.h and use it here as well as MemoryProfileInfo.cpp where…
				case (uint8_t)AllocationType::Cold:
				case (uint8_t)AllocationType::NotCold:
				return true;
				break;
				case (uint8_t)AllocationType::None:
				assert(false);
				break;
				default:
				return false;
				break;
				}
				llvm_unreachable("invalid alloc type");
				}

				/// CRTP base for graphs built from either IR or ThinLTO summary index.
				///
				/// The graph represents the call contexts in all memprof metadata on allocation
				/// calls, with nodes for the allocations themselves, as well as for the calls
				/// in each context. The graph is initially built from the allocation memprof
				/// metadata (or summary) MIBs. It is then updated to match calls with callsite
				/// metadata onto the nodes, updating it to reflect any inlining performed on
				/// those calls.
				///
				/// Each MIB (representing an allocation's call context with allocation
				/// behavior) is assigned a unique context id during the graph build. The edges
				/// and nodes in the graph are decorated with the context ids they carry. This
				/// is used to correctly update the graph when cloning is performed so that we
				/// can uniquify the context for a single (possibly cloned) allocation.
				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				class CallsiteContextGraph {
				public:
				CallsiteContextGraph() = default;
				CallsiteContextGraph(const CallsiteContextGraph &) = default;
				CallsiteContextGraph(CallsiteContextGraph &&) = default;

				/// Main entry point to perform analysis and transformations on graph.
				bool process();

				void dump() const;
				void print(raw_ostream &OS) const;

				friend raw_ostream &operator<<(raw_ostream &OS,
				const CallsiteContextGraph &CCG) {
				CCG.print(OS);
				return OS;
				}

				void exportToDot(std::string Path) const;

				/// Represents a function clone via FuncTy pointer and clone number pair.
				struct FuncInfo : public std::pair<FuncTy , unsigned /Clone number*/> {
				using Base = std::pair<FuncTy *, unsigned>;
				FuncInfo(const Base &B) : Base(B) {}
				FuncInfo(FuncTy *F = nullptr, unsigned CloneNo = 0) : Base(F, CloneNo) {}
				snehasishUnsubmitted Done Reply Inline Actions I guess inheriting from std::pair provides equality operators etc? Is there any other reason to derive from std::pair as compared to just defining the members inline. nit: struct FuncInfo final (same for CallInfo below) snehasish: I guess inheriting from std::pair provides equality operators etc? Is there any other reason to…
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions Yes, it is for inheriting the operators, rather than redefining them. This same trick is used elsewhere in llvm. tejohnson: Yes, it is for inheriting the operators, rather than redefining them. This same trick is used…
				explicit operator bool() const { return this->first != nullptr; }
				FuncTy *func() const { return this->first; }
				unsigned cloneNo() const { return this->second; }
				};

				/// Represents a callsite clone via CallTy and clone number pair.
				struct CallInfo : public std::pair<CallTy, unsigned /Clone number/> {
				using Base = std::pair<CallTy, unsigned>;
				CallInfo(const Base &B) : Base(B) {}
				CallInfo(CallTy Call = nullptr, unsigned CloneNo = 0)
				: Base(Call, CloneNo) {}
				explicit operator bool() const { return (bool)this->first; }
				CallTy call() const { return this->first; }
				unsigned cloneNo() const { return this->second; }
				void setCloneNo(unsigned N) { this->second = N; }
				void print(raw_ostream &OS) const {
				if (!(bool)*this) {
				assert(!cloneNo());
				OS << "null Call";
				return;
				snehasishUnsubmitted Done Reply Inline Actions nit: IMO `operator bool()` is less cryptic than `(bool)this` snehasish:* nit: IMO `operator bool()` is less cryptic than `(bool)*this`
				}
				call()->print(OS);
				OS << "\t(clone " << cloneNo() << ")";
				}
				void dump() const {
				print(dbgs());
				dbgs() << "\n";
				}
				friend raw_ostream &operator<<(raw_ostream &OS, const CallInfo &Call) {
				Call.print(OS);
				return OS;
				}
				};

				struct ContextEdge;

				/// Node in the Callsite Context Graph
				struct ContextNode {
				// Keep this for now since in the IR case where we have an Instruction* it
				// is not as immediately discoverable. Used for printing richer information
				// when dumping graph.
				bool IsAllocation;

				// Keeps track of when the Call was reset to null because there was
				// recursion.
				bool Recursive = false;

				// The corresponding allocation or interior call.
				CallInfo Call;

				// For alloc nodes this is a unique id assigned when constructed, and for
				// callsite stack nodes it is the original stack id when the node is
				// constructed from the memprof MIB metadata on the alloc nodes. For any
				// clones it will be 0 (it is only used when matching callsite metadata onto
				// the stack nodes created when processing the allocation memprof MIBs).
				uint64_t OrigStackOrAllocId = 0;

				// This will be formed by ORing together the AllocationType enum values
				// for contexts including this node.
				uint8_t AllocTypes = 0;

				// Edges to all callees in the profiled call stacks.
				// TODO: Should this be a map (from Callee node) for more efficient lookup?
				std::vector<std::shared_ptr<ContextEdge>> CalleeEdges;

				// Edges to all callers in the profiled call stacks.
				// TODO: Should this be a map (from Caller node) for more efficient lookup?
				std::vector<std::shared_ptr<ContextEdge>> CallerEdges;

				// The set of IDs for contexts including this node.
				DenseSet<uint32_t> ContextIds;

				// List of clones of this ContextNode, initially empty.
				std::vector<ContextNode *> Clones;

				// If a clone, points to the original uncloned node.
				ContextNode *CloneOf = nullptr;

				ContextNode(bool IsAllocation) : IsAllocation(IsAllocation), Call() {}

				ContextNode(bool IsAllocation, CallInfo C)
				: IsAllocation(IsAllocation), Call(C) {}

				std::unique_ptr<ContextNode> clone() {
				auto Clone = std::make_unique<ContextNode>(IsAllocation, Call);
				if (CloneOf) {
				CloneOf->Clones.push_back(Clone.get());
				Clone->CloneOf = CloneOf;
				} else {
				Clones.push_back(Clone.get());
				Clone->CloneOf = this;
				}
				return Clone;
				}

				ContextNode *getOrigNode() {
				if (!CloneOf)
				return this;
				return CloneOf;
				}

				void addOrUpdateCallerEdge(ContextNode *Caller, AllocationType AllocType,
				unsigned int ContextId);

				ContextEdge findEdgeFromCallee(const ContextNode Callee);
				void eraseCalleeEdge(const ContextEdge *Edge);
				void eraseCallerEdge(const ContextEdge *Edge);

				void setCall(CallInfo C) { Call = C; }

				bool hasCall() const { return (bool)Call.call(); }

				void printCall(raw_ostream &OS) const { Call.print(OS); }

				void dump() const;
				void print(raw_ostream &OS) const;

				friend raw_ostream &operator<<(raw_ostream &OS, const ContextNode &Node) {
				Node.print(OS);
				return OS;
				}
				};

				/// Edge in the Callsite Context Graph from a ContextNode N to a caller or
				/// callee.
				struct ContextEdge {
				ContextNode *Callee;
				ContextNode *Caller;

				// This will be formed by ORing together the AllocationType enum values
				// for contexts including this edge.
				uint8_t AllocTypes = 0;

				// The set of IDs for contexts including this edge.
				DenseSet<uint32_t> ContextIds;

				ContextEdge(ContextNode Callee, ContextNode Caller, uint8_t AllocType,
				DenseSet<uint32_t> ContextIds)
				: Callee(Callee), Caller(Caller), AllocTypes(AllocType),
				ContextIds(ContextIds) {}

				DenseSet<uint32_t> &getContextIds() { return ContextIds; }

				void dump() const;
				void print(raw_ostream &OS) const;

				friend raw_ostream &operator<<(raw_ostream &OS, const ContextEdge &Edge) {
				Edge.print(OS);
				return OS;
				}
				};

				protected:
				/// Get a list of nodes corresponding to the stack ids in the given callsite
				/// context.
				template <class NodeT, class IteratorT>
				std::vector<uint64_t>
				getStackIdsWithContextNodes(CallStack<NodeT, IteratorT> &CallsiteContext);

				/// Adds nodes for the given allocation and any stack ids on its memprof MIB
				/// metadata (or summary).
				ContextNode *addAllocNode(CallInfo Call);

				/// Adds nodes for the given MIB stack ids.
				template <class NodeT, class IteratorT>
				void addStackNodesForMIB(ContextNode *AllocNode,
				CallStack<NodeT, IteratorT> StackContext,
				CallStack<NodeT, IteratorT> &CallsiteContext,
				AllocationType AllocType);

				/// Matches all callsite metadata (or summary) to the nodes created for
				/// allocation memprof MIB metadata, synthesizing new nodes to reflect any
				/// inlining performed on those callsite instructions.
				void updateStackNodes();

				/// Update graph to conservatively handle any callsite stack nodes that target
				/// multiple different callee target functions.
				void handleCallsitesWithMultipleTargets();

				/// Save lists of calls with PGHO metadata in each function, for faster
				/// iteration.
				std::map<FuncTy *, std::vector<CallInfo>> FuncToCallsWithMetadata;

				/// Map from callsite node to the enclosing caller function.
				std::map<const ContextNode , const FuncTy > NodeToCallingFunc;

				private:
				using EdgeIter = typename std::vector<std::shared_ptr<ContextEdge>>::iterator;

				/// Duplicates the given set of context ids, returning the new duplicated
				/// id set. The new ids are propagated onto the corresponding nodes
				/// from FirstNode towards its leaf callees, and from LastNode towards
				/// its root callers.
				DenseSet<uint32_t>
				duplicateContextIds(const DenseSet<uint32_t> &StackSequenceContextIds,
				ContextNode FirstNode, ContextNode LastNode);

				/// Connect the NewNode to OrigNode as a new callee if TowardsCallee is true,
				/// else as a new caller.
				void connectNewNode(ContextNode NewNode, ContextNode OrigNode,
				bool TowardsCallee);

				/// Get the stack id corresponding to the given Id or Index (for IR this will
				/// return itself, for a summary index this will return the id recorded in the
				/// index for that stack id index value.
				uint64_t getStackId(uint64_t IdOrIndex) const {
				return static_cast<const DerivedCCG *>(this)->getStackId(IdOrIndex);
				}

				/// Returns true if the given call targets the given function.
				bool calleeMatchesFunc(CallTy Call, const FuncTy *Func) {
				return static_cast<DerivedCCG *>(this)->calleeMatchesFunc(Call, Func);
				}

				/// Get a list of nodes corresponding to the stack ids in the given
				/// callsite's context.
				std::vector<uint64_t> getStackIdsWithContextNodesForCall(CallTy Call) {
				return static_cast<DerivedCCG *>(this)->getStackIdsWithContextNodesForCall(
				Call);
				}

				/// Get the last stack id in the context for callsite.
				uint64_t getLastStackId(CallTy Call) {
				return static_cast<DerivedCCG *>(this)->getLastStackId(Call);
				}

				/// Gets a label to use in the dot graph for the given call clone in the given
				/// function.
				std::string getLabel(const FuncTy *Func, const CallTy Call,
				unsigned CloneNo) const {
				return static_cast<const DerivedCCG *>(this)->getLabel(Func, Call, CloneNo);
				}

				/// Helpers to find the node corresponding to the given call or stackid.
				ContextNode *getNodeForInst(const CallInfo &C);
				ContextNode *getNodeForAlloc(const CallInfo &C);
				ContextNode *getNodeForStackId(uint64_t StackId);

				/// Removes the node information recorded for the given call.
				void unsetNodeForInst(const CallInfo &C);

				/// Computes the alloc type corresponding to the given context ids, by
				/// unioning their recorded alloc types.
				uint8_t computeAllocType(DenseSet<uint32_t> &ContextIds);

				/// Create a clone of Edge's callee and move Edge to that new callee node,
				/// performing the necessary context id and allocation type updates.
				/// If callee's caller edge iterator is supplied, it is updated when removing
				/// the edge from that list.
				ContextNode *
				moveEdgeToNewCalleeClone(const std::shared_ptr<ContextEdge> &Edge,
				EdgeIter *CallerEdgeI = nullptr);

				/// Change the callee of Edge to existing callee clone NewCallee, performing
				/// the necessary context id and allocation type updates.
				/// If callee's caller edge iterator is supplied, it is updated when removing
				/// the edge from that list.
				void moveEdgeToExistingCalleeClone(const std::shared_ptr<ContextEdge> &Edge,
				ContextNode *NewCallee,
				EdgeIter *CallerEdgeI = nullptr,
				bool NewClone = false);

				/// Map from each context ID to the AllocationType assigned to that context.
				std::map<uint32_t, AllocationType> ContextIdToAllocationType;

				/// Identifies the context node created for a stack id when adding the MIB
				/// contexts to the graph. This is used to locate the context nodes when
				/// trying to assign the corresponding callsites with those stack ids to these
				/// nodes.
				std::map<uint64_t, ContextNode *> StackEntryIdToContextNodeMap;

				/// Maps to track the calls to their corresponding nodes in the graph.
				std::map<const CallInfo, ContextNode *> AllocationCallToContextNodeMap;
				std::map<const CallInfo, ContextNode *> NonAllocationCallToContextNodeMap;

				/// Owner of all ContextNode unique_ptrs.
				std::vector<std::unique_ptr<ContextNode>> NodeOwner;

				/// Perform sanity checks on graph when requested.
				void check() const;

				/// Keeps track of the last unique context id assigned.
				unsigned int LastContextId = 0;
				};

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				using ContextNode =
				typename CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::ContextNode;
				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				using ContextEdge =
				typename CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::ContextEdge;
				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				using FuncInfo =
				typename CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::FuncInfo;
				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				using CallInfo =
				typename CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::CallInfo;

				/// CRTP derived class for graphs built from IR (regular LTO).
				class ModuleCallsiteContextGraph
				: public CallsiteContextGraph<ModuleCallsiteContextGraph, Function,
				Instruction *> {
				public:
				ModuleCallsiteContextGraph(Module &M);

				private:
				friend CallsiteContextGraph<ModuleCallsiteContextGraph, Function,
				Instruction *>;

				uint64_t getStackId(uint64_t IdOrIndex) const;
				bool calleeMatchesFunc(Instruction Call, const Function Func);
				uint64_t getLastStackId(Instruction *Call);
				std::vector<uint64_t> getStackIdsWithContextNodesForCall(Instruction *Call);
				std::string getLabel(const Function Func, const Instruction Call,
				unsigned CloneNo) const;

				const Module &Mod;
				};

				/// Represents a call in the summary index graph, which can either be an
				/// allocation or an interior callsite node in an allocation's context.
				/// Holds a pointer to the corresponding data structure in the index.
				struct IndexCall : public PointerUnion<CallsiteInfo , AllocInfo > {
				IndexCall() : PointerUnion() {}
				IndexCall(std::nullptr_t) : IndexCall() {}
				IndexCall(CallsiteInfo *StackNode) : PointerUnion(StackNode) {}
				IndexCall(AllocInfo *AllocNode) : PointerUnion(AllocNode) {}

				IndexCall *operator->() { return this; }

				void print(raw_ostream &OS) const {
				if (auto AI = dyn_cast<AllocInfo >())
				OS << *AI;
				else {
				auto CI = dyn_cast<CallsiteInfo >();
				assert(CI);
				OS << *CI;
				}
				}
				};

				/// CRTP derived class for graphs built from summary index (ThinLTO).
				class IndexCallsiteContextGraph
				: public CallsiteContextGraph<IndexCallsiteContextGraph, FunctionSummary,
				IndexCall> {
				public:
				IndexCallsiteContextGraph(
				ModuleSummaryIndex &Index,
				function_ref<bool(GlobalValue::GUID, const GlobalValueSummary *)>
				isPrevailing);

				private:
				friend CallsiteContextGraph<IndexCallsiteContextGraph, FunctionSummary,
				IndexCall>;

				uint64_t getStackId(uint64_t IdOrIndex) const;
				bool calleeMatchesFunc(IndexCall &Call, const FunctionSummary *Func);
				uint64_t getLastStackId(IndexCall &Call);
				std::vector<uint64_t> getStackIdsWithContextNodesForCall(IndexCall &Call);
				std::string getLabel(const FunctionSummary *Func, const IndexCall &Call,
				unsigned CloneNo) const;

				// Saves mapping from function summaries containing memprof records back to
				// its VI, for use in checking and debugging.
				std::map<const FunctionSummary *, ValueInfo> FSToVIMap;

				const ModuleSummaryIndex &Index;
				};

				namespace {

				struct FieldSeparator {
				bool Skip = true;
				const char *Sep;

				FieldSeparator(const char *Sep = ", ") : Sep(Sep) {}
				};

				raw_ostream &operator<<(raw_ostream &OS, FieldSeparator &FS) {
				if (FS.Skip) {
				FS.Skip = false;
				return OS;
				}
				return OS << FS.Sep;
				}

				// Map the uint8_t alloc types (which may contain NotCold\|Cold) to the alloc
				// type we should actually use on the corresponding allocation.
				// If we can't clone a node that has NotCold+Cold alloc type, we will fall
				// back to using NotCold. So don't bother cloning to distinguish NotCold+Cold
				// from NotCold.
				AllocationType allocTypeToUse(uint8_t AllocTypes) {
				assert(AllocTypes != (uint8_t)AllocationType::None);
				if (AllocTypes ==
				((uint8_t)AllocationType::NotCold \| (uint8_t)AllocationType::Cold))
				return AllocationType::NotCold;
				else
				return (AllocationType)AllocTypes;
				}

				} // end anonymous namespace

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				ContextNode<DerivedCCG, FuncTy, CallTy> *
				CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::getNodeForInst(
				const CallInfo &C) {
				ContextNode *Node = getNodeForAlloc(C);
				if (Node)
				return Node;

				auto NonAllocCallNode = NonAllocationCallToContextNodeMap.find(C);
				if (NonAllocCallNode != NonAllocationCallToContextNodeMap.end()) {
				return NonAllocCallNode->second;
				}
				return nullptr;
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				ContextNode<DerivedCCG, FuncTy, CallTy> *
				CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::getNodeForAlloc(
				const CallInfo &C) {
				auto AllocCallNode = AllocationCallToContextNodeMap.find(C);
				if (AllocCallNode != AllocationCallToContextNodeMap.end()) {
				return AllocCallNode->second;
				}
				return nullptr;
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				ContextNode<DerivedCCG, FuncTy, CallTy> *
				CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::getNodeForStackId(
				uint64_t StackId) {
				auto StackEntryNode = StackEntryIdToContextNodeMap.find(StackId);
				if (StackEntryNode != StackEntryIdToContextNodeMap.end())
				return StackEntryNode->second;
				return nullptr;
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::unsetNodeForInst(
				const CallInfo &C) {
				AllocationCallToContextNodeMap.erase(C) \|\|
				NonAllocationCallToContextNodeMap.erase(C);
				assert(!AllocationCallToContextNodeMap.count(C) &&
				!NonAllocationCallToContextNodeMap.count(C));
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::ContextNode::
				addOrUpdateCallerEdge(ContextNode *Caller, AllocationType AllocType,
				unsigned int ContextId) {
				for (auto &Edge : CallerEdges) {
				if (Edge->Caller == Caller) {
				Edge->AllocTypes \|= (uint8_t)AllocType;
				Edge->getContextIds().insert(ContextId);
				return;
				}
				}
				DenseSet<uint32_t> ContextIdSet({ContextId});
				std::shared_ptr<ContextEdge> Edge = std::make_shared<ContextEdge>(
				this, Caller, (uint8_t)AllocType, ContextIdSet);
				CallerEdges.push_back(Edge);
				Caller->CalleeEdges.push_back(Edge);
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				ContextEdge<DerivedCCG, FuncTy, CallTy> *
				CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::ContextNode::
				findEdgeFromCallee(const ContextNode *Callee) {
				for (const auto &Edge : CalleeEdges)
				if (Edge->Callee == Callee)
				return Edge.get();
				return nullptr;
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::ContextNode::
				eraseCalleeEdge(const ContextEdge *Edge) {
				auto EI =
				std::find_if(CalleeEdges.begin(), CalleeEdges.end(),
				[Edge](const std::shared_ptr<ContextEdge> &CalleeEdge) {
				return CalleeEdge.get() == Edge;
				});
				assert(EI != CalleeEdges.end());
				CalleeEdges.erase(EI);
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::ContextNode::
				eraseCallerEdge(const ContextEdge *Edge) {
				auto EI =
				std::find_if(CallerEdges.begin(), CallerEdges.end(),
				[Edge](const std::shared_ptr<ContextEdge> &CallerEdge) {
				return CallerEdge.get() == Edge;
				});
				assert(EI != CallerEdges.end());
				CallerEdges.erase(EI);
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				uint8_t CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::computeAllocType(
				DenseSet<uint32_t> &ContextIds) {
				uint8_t BothTypes =
				(uint8_t)AllocationType::Cold \| (uint8_t)AllocationType::NotCold;
				uint8_t AllocType = (uint8_t)AllocationType::None;
				for (auto Id : ContextIds) {
				AllocType \|= (uint8_t)ContextIdToAllocationType[Id];
				// Bail early if alloc type reached both, no further refinement.
				if (AllocType == BothTypes)
				return AllocType;
				}
				return AllocType;
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				ContextNode<DerivedCCG, FuncTy, CallTy> *
				CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::addAllocNode(CallInfo Call) {
				assert(!getNodeForAlloc(Call));
				NodeOwner.push_back(
				std::make_unique<ContextNode>(/IsAllocation/ true, Call));
				ContextNode *AllocNode = NodeOwner.back().get();
				AllocationCallToContextNodeMap[Call] = AllocNode;
				// Use LastContextId as a uniq id for MIB allocation nodes.
				AllocNode->OrigStackOrAllocId = LastContextId;
				// Alloc type should be updated as we add in the MIBs. We should assert
				// afterwards that it is not still None.
				AllocNode->AllocTypes = (uint8_t)AllocationType::None;

				return AllocNode;
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				template <class NodeT, class IteratorT>
				void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::addStackNodesForMIB(
				ContextNode *AllocNode, CallStack<NodeT, IteratorT> StackContext,
				CallStack<NodeT, IteratorT> &CallsiteContext, AllocationType AllocType) {
				ContextIdToAllocationType[++LastContextId] = AllocType;

				// Update alloc type and context ids for this MIB.
				AllocNode->AllocTypes \|= (uint8_t)AllocType;
				AllocNode->ContextIds.insert(LastContextId);

				// Now add or update nodes for each stack id in alloc's context.
				// Later when processing the stack ids on non-alloc callsites we will adjust
				// for any inlining in the context.
				ContextNode *PrevNode = AllocNode;
				// Look for recursion (direct recursion should have been collapsed by
				// module summary analysis, here we should just be detecting mutual
				// recursion). Mark these nodes so we don't try to clone.
				SmallSet<uint64_t, 8> StackIdSet;
				// Skip any on the allocation call (inlining).
				for (auto ContextIter = StackContext.beginAfterSharedPrefix(CallsiteContext);
				ContextIter != StackContext.end(); ++ContextIter) {
				auto StackId = getStackId(*ContextIter);
				ContextNode *StackNode = getNodeForStackId(StackId);
				if (!StackNode) {
				NodeOwner.push_back(
				std::make_unique<ContextNode>(/IsAllocation/ false));
				StackNode = NodeOwner.back().get();
				StackEntryIdToContextNodeMap[StackId] = StackNode;
				StackNode->OrigStackOrAllocId = StackId;
				}
				auto Ins = StackIdSet.insert(StackId);
				if (!Ins.second)
				StackNode->Recursive = true;
				StackNode->ContextIds.insert(LastContextId);
				StackNode->AllocTypes \|= (uint8_t)AllocType;
				PrevNode->addOrUpdateCallerEdge(StackNode, AllocType, LastContextId);
				PrevNode = StackNode;
				}
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				DenseSet<uint32_t>
				CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::duplicateContextIds(
				const DenseSet<uint32_t> &StackSequenceContextIds, ContextNode *FirstNode,
				ContextNode *LastNode) {
				// Replicate context ids with new ones (at the same position in the context
				// id vector). Set up a map entry for the below traversal.
				DenseSet<uint32_t> NewContextIds;
				DenseMap<uint32_t, uint32_t> OldToNewContextIds;
				for (auto OldId : StackSequenceContextIds) {
				NewContextIds.insert(++LastContextId);
				OldToNewContextIds[OldId] = LastContextId;
				assert(ContextIdToAllocationType.count(OldId));
				ContextIdToAllocationType[LastContextId] = ContextIdToAllocationType[OldId];
				}

				auto Propagate = [&](ContextNode *PrevNode, bool TowardsCallee,
				const DenseSet<uint32_t> &PrevIds,
				DenseSet<const ContextNode *> &Visited,
				auto &&Propagate) -> void {
				for (auto &Edge :
				(TowardsCallee ? PrevNode->CalleeEdges : PrevNode->CallerEdges)) {
				DenseSet<uint32_t> CurIds =
				set_intersection(PrevIds, Edge->getContextIds());
				// Done if no ids found along this edge.
				if (CurIds.empty())
				continue;
				// Map curids to new ids in edge, and in the caller/callee. Then
				// recurse using cur ids.
				ContextNode *NextNode = TowardsCallee ? Edge->Callee : Edge->Caller;
				auto Inserted = Visited.insert(NextNode);
				if (!Inserted.second)
				continue;
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions I discovered when testing with a larger app and more graph validation enabled that this early return is not correct. We can have different subsets of duplicate context nodes being propagated along different edges to the same node, and so we were stopping the propagation early. This meant both an insane graph (context ids of node didn't match those of caller and callee edges) but also prevented some cloning. Removing this however leads to long compile time. I redesigned the context duplication and the caller which is updateStackNodes to split this handling into 3: walk the calls and perform all necessary context id duplication, saving the info in a map (in updateStackNodes and a modified duplicateContextIds). propagate the context ids across the full graph in a single pass (in a new propagateDuplicateContextIds called from updateStackNodes). do the post order traversal to generate new nodes for inlined call chains, moving the context ids determined earlier (which might be duplicates) to the new node (in updateStackNodes). This allowed handling quite a few more cold calls in my large application, with smaller compile time than the patch even without the fix to remove this early termination. I'll upload the new version of this handling in a few minutes. tejohnson: I discovered when testing with a larger app and more graph validation enabled that this early…
				for (auto Id : CurIds) {
				auto NewId = OldToNewContextIds[Id];
				Edge->getContextIds().insert(NewId);
				NextNode->ContextIds.insert(NewId);
				}
				Propagate(NextNode, TowardsCallee, CurIds, Visited, Propagate);
				}
				};

				// Walk towards allocation nodes from FirstNode, adding the new context ids in
				// the sets containing the old context ids.
				DenseSet<const ContextNode *> Visited;
				Propagate(FirstNode, /TowardsCallee/ true, StackSequenceContextIds, Visited,
				Propagate);

				// Walk towards allocation nodes from FirstNode, adding the new context ids in
				// the sets containing the old context ids.
				Visited.clear();
				Propagate(LastNode, /TowardsCallee/ false, StackSequenceContextIds, Visited,
				Propagate);

				return NewContextIds;
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::connectNewNode(
				ContextNode NewNode, ContextNode OrigNode, bool TowardsCallee) {
				// Make a copy of the context ids, since this will be adjusted below as they
				// are moved.
				DenseSet<uint32_t> RemainingContextIds = NewNode->ContextIds;
				auto &OrigEdges =
				TowardsCallee ? OrigNode->CalleeEdges : OrigNode->CallerEdges;
				// Increment iterator in loop so that we can remove edges as needed.
				for (auto EI = OrigEdges.begin(); EI != OrigEdges.end();) {
				auto Edge = *EI;
				// Remove any matching context ids from Edge, return set that were found and
				// removed, these are the new edge's context ids. Also update the remaining
				// (not found ids).
				DenseSet<uint32_t> NewEdgeContextIds, NotFoundContextIds;
				set_subtract(Edge->getContextIds(), RemainingContextIds, NewEdgeContextIds,
				NotFoundContextIds);
				RemainingContextIds.swap(NotFoundContextIds);
				// If no matching context ids for this edge, skip it.
				if (NewEdgeContextIds.empty()) {
				++EI;
				continue;
				}
				snehasishUnsubmitted Done Reply Inline Actions missing = in the comment, i.e. /TowardsCallee=/ There are a few other cases in this patch. https://clang.llvm.org/extra/clang-tidy/checks/bugprone/argument-comment.html snehasish: missing = in the comment, i.e. /TowardsCallee=/ There are a few other cases in this patch.
				if (TowardsCallee) {
				auto NewEdge = std::make_shared<ContextEdge>(
				Edge->Callee, NewNode, computeAllocType(NewEdgeContextIds),
				NewEdgeContextIds);
				NewNode->CalleeEdges.push_back(NewEdge);
				NewEdge->Callee->CallerEdges.push_back(NewEdge);
				} else {
				auto NewEdge = std::make_shared<ContextEdge>(
				NewNode, Edge->Caller, computeAllocType(NewEdgeContextIds),
				NewEdgeContextIds);
				NewNode->CallerEdges.push_back(NewEdge);
				NewEdge->Caller->CalleeEdges.push_back(NewEdge);
				}
				// Remove old edge if context ids empty.
				if (Edge->getContextIds().empty()) {
				if (TowardsCallee) {
				Edge->Callee->eraseCallerEdge(Edge.get());
				EI = OrigNode->CalleeEdges.erase(EI);
				} else {
				Edge->Caller->eraseCalleeEdge(Edge.get());
				EI = OrigNode->CallerEdges.erase(EI);
				}
				continue;
				}
				++EI;
				}
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::updateStackNodes() {
				// Map of stack id to all calls with that as the last (outermost caller)
				snehasishUnsubmitted Not Done Reply Inline Actions Can we use shared_ptr for the edges so that we don't have to manually track ownership? (also mentioned in the other patch) snehasish: Can we use shared_ptr for the edges so that we don't have to manually track ownership? (also…
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions Yes we could. I have some concern about the additional overhead and whether it is worth it given that tracking these didn't end up being too difficult. I'll collect some measurements, it might be small given the overhead of all the rest of the graph memory. tejohnson: Yes we could. I have some concern about the additional overhead and whether it is worth it…
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions I have been playing around with using shared_ptr for the ContextEdges. For some reason, the memory is really blowing up. I expect some increase, but not what I am seeing. I confirmed I am using make_shared which should minimize the overhead, and I don't see where I would be creating extra copies that live somewhere longer than expected, but I feel like I must be doing something wrong. Still looking... tejohnson: I have been playing around with using shared_ptr for the ContextEdges. For some reason, the…
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions I found and fixed a bug causing this blow up. The shared_ptr memory seems reasonable. tejohnson: I found and fixed a bug causing this blow up. The shared_ptr memory seems reasonable.
				// callsite id that has a context node (some might not due to pruning
				// performed during matching of the allocation profile contexts).
				// The CallContextInfo contains the Call and a list of its stack ids with
				// ContextNodes, as well as the function containing Call.
				using CallContextInfo =
				std::tuple<CallTy, std::vector<uint64_t>, const FuncTy *>;
				DenseMap<uint64_t, std::vector<CallContextInfo>> StackIdToMatchingCalls;
				for (auto &FuncEntry : FuncToCallsWithMetadata) {
				auto *Func = FuncEntry.first;
				for (auto Call : FuncEntry.second) {
				// Ignore allocations, already handled.
				if (AllocationCallToContextNodeMap.count(Call))
				continue;
				auto StackIdsWithContextNodes =
				getStackIdsWithContextNodesForCall(Call.call());
				// If there were no nodes created for MIBs on allocs (maybe this was in
				// the unambiguous part of the MIB stack that was pruned), ignore.
				if (StackIdsWithContextNodes.empty())
				continue;
				// Otherwise, record this Call along with the list of ids for the last
				// (outermost caller) stack id with a node.
				StackIdToMatchingCalls[StackIdsWithContextNodes.back()].push_back(
				{Call.call(), StackIdsWithContextNodes, Func});
				}
				}

				// Now perform a post-order traversal over the graph, starting with the
				// allocation nodes, essentially processing nodes from callers to callees.
				// For any that contains an id in the map, update the graph to contain new
				// nodes representing any inlining at interior callsites. Note we move the
				// associated context ids over to the new nodes.
				auto ProcessNode = [&](ContextNode *Node,
				DenseSet<const ContextNode *> &Visited,
				auto &&ProcessNode) {
				auto Inserted = Visited.insert(Node);
				if (!Inserted.second)
				return;
				// Post order traversal. Iterate over a copy since we may add nodes and
				// therefore new callers during the recursive call, invalidating any
				// iterator over the original edge vector. We don't need to process these
				// new nodes as they were already processed on creation.
				auto CallerEdges = Node->CallerEdges;
				for (auto &Edge : CallerEdges) {
				// Skip any that have been removed during the recursion.
				if (!Edge)
				continue;
				ProcessNode(Edge->Caller, Visited, ProcessNode);
				}

				// Ignore this node if it is for an allocation or we didn't record any
				// stack id lists ending at it.
				if (Node->IsAllocation \|\|
				!StackIdToMatchingCalls.count(Node->OrigStackOrAllocId))
				return;

				auto Calls = StackIdToMatchingCalls[Node->OrigStackOrAllocId];
				// Handle the simple case first. A single call with a single stack id.
				// In this case there is no need to create any new context nodes, simply
				// assign the context node for stack id to this Call.
				if (Calls.size() == 1) {
				auto &[Call, Ids, Func] = Calls[0];
				if (Ids.size() == 1) {
				// It should be this Node
				assert(Node == getNodeForStackId(Ids[0]));
				if (Node->Recursive)
				return;
				Node->setCall(Call);
				snehasishUnsubmitted Done Reply Inline Actions This lambda is doing a lot of work, can me move this out to its own method? It's a little hard to read right now because of the size and doing so would make it a little easier to read. snehasish: This lambda is doing a lot of work, can me move this out to its own method? It's a little hard…
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions Refactored out and now named assignStackNodesPostOrder tejohnson: Refactored out and now named assignStackNodesPostOrder
				NonAllocationCallToContextNodeMap[Call] = Node;
				NodeToCallingFunc[Node] = Func;
				return;
				}
				}
				// In order to do the best and maximal matching of
				// inlined calls to context node sequences we will sort the vectors of stack
				// ids in reverse order of length, and within each length, lexicographically
				// by stack id. The latter is so that we can specially handle calls that
				// have identical stack id sequences (either due to cloning or artificially
				// because of the MIB context pruning).
				std::sort(Calls.begin(), Calls.end(),
				[](const CallContextInfo &A, const CallContextInfo &B) {
				auto &[CallA, IdsA, FuncA] = A;
				auto &[CallB, IdsB, FuncB] = B;
				return IdsA.size() > IdsB.size() \|\|
				(IdsA.size() == IdsB.size() && IdsA < IdsB);
				});
				for (unsigned I = 0; I < Calls.size(); I++) {
				auto &[Call, Ids, Func] = Calls[I];
				// First compute the context ids for this stack id sequence (the
				// intersection of the context ids of the corresponding nodes).
				DenseSet<uint32_t> StackSequenceContextIds;
				ContextNode *CurNode = nullptr;
				snehasishUnsubmitted Done Reply Inline Actions auto& to avoid a copy? snehasish: auto& to avoid a copy?
				ContextNode *PrevNode = nullptr;
				ContextNode *FirstNode = nullptr;
				bool Skip = false;
				for (auto Id : Ids) {
				CurNode = getNodeForStackId(Id);
				// We should only have kept stack ids that had nodes.
				assert(CurNode);

				if (CurNode->Recursive) {
				Skip = true;
				break;
				}

				// If this is the first node, simply initialize the context ids set.
				// We will subsequently refine the context ids by computing the
				// intersection along all edges (initializing with the first node's
				// context ids handles the case where there is a single id/node).
				if (!FirstNode) {
				FirstNode = PrevNode = CurNode;
				snehasishUnsubmitted Done Reply Inline Actions nit: s/reverse/descending? snehasish: nit: s/reverse/descending?
				StackSequenceContextIds = CurNode->ContextIds;
				continue;
				}

				auto *Edge = Node->findEdgeFromCallee(PrevNode);
				// If there is no edge then the nodes belong to different MIB contexts,
				// and we should skip this inlined context sequence.
				if (!Edge) {
				Skip = true;
				break;
				}
				PrevNode = CurNode;

				// Update the context ids, which is the intersection of the ids along
				// all edges in the sequence.
				set_intersect(StackSequenceContextIds, Edge->getContextIds());

				// If we now have no context ids for clone, skip this call.
				if (StackSequenceContextIds.empty()) {
				Skip = true;
				break;
				}
				}
				if (Skip)
				continue;

				ContextNode *LastNode = CurNode;

				// If some of this call's stack ids did not have corresponding nodes (due
				// to pruning), don't include any context ids for contexts that extend
				// beyond these nodes. Otherwise we would be matching part of unrelated /
				// not fully matching stack contexts. To do this, subtract any context ids
				// found in caller nodes of the last node found above.
				if (Ids.back() != getLastStackId(Call)) {
				assert(CurNode);
				for (auto &PE : CurNode->CallerEdges) {
				set_subtract(StackSequenceContextIds, PE->getContextIds());
				if (StackSequenceContextIds.empty())
				break;
				}
				// If we now have no context ids for clone, skip this call.
				snehasishUnsubmitted Done Reply Inline Actions I didn't understand how we can have this case with different MIB contexts here. Can you elaborate in the comment? snehasish: I didn't understand how we can have this case with different MIB contexts here. Can you…
				if (StackSequenceContextIds.empty())
				continue;
				}

				// Check if the next set of stack ids is the same (since the Calls vector
				// of tuples is sorted by the stack ids we can just look at the next one).
				bool DuplicateStackIds = false;
				if (I + 1 < Calls.size()) {
				auto NextIds = std::get<1>(Calls[I + 1]);
				DuplicateStackIds = Ids == NextIds;
				}

				// Create new context node. If we don't have duplicate stack ids, then we
				// can move all the context ids computed for the original node sequence
				// onto the clone. If there are duplicate calls with the same stack ids
				// then we synthesize new context ids that are duplicates of the
				// originals. These new context ids need to be propagated to all
				// nodes/edges on the same contexts, other than those being subsumed
				// by the clone.
				NodeOwner.push_back(
				std::make_unique<ContextNode>(/IsAllocation/ false, Call));
				snehasishUnsubmitted Done Reply Inline Actions I think LastNode is redundant and CurNode can be used in L1016 and L1024. snehasish: I think LastNode is redundant and CurNode can be used in L1016 and L1024.
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions We could, but it is clearer IMO to use a name that corresponds to the meaning at those uses below. In any case, this code has been changed and we now assign LastNode more directly. tejohnson: We could, but it is clearer IMO to use a name that corresponds to the meaning at those uses…
				ContextNode *NewNode = NodeOwner.back().get();
				NodeToCallingFunc[NewNode] = Func;
				NonAllocationCallToContextNodeMap[Call] = NewNode;
				NewNode->ContextIds = DuplicateStackIds
				? duplicateContextIds(StackSequenceContextIds,
				FirstNode, LastNode)
				: StackSequenceContextIds;
				NewNode->AllocTypes = computeAllocType(NewNode->ContextIds);

				// Connect to callees of innermost stack frame in inlined call chain.
				connectNewNode(NewNode, FirstNode, /TowardsCallee/ true);

				// Connect to callers of outermost stack frame in inlined call chain.
				connectNewNode(NewNode, LastNode, /TowardsCallee/ false);

				// If we didn't create new context ids for clone, we need to remove those
				// moved to the clone from the original nodes.
				if (!DuplicateStackIds) {
				ContextNode *PrevNode = nullptr;
				for (auto Id : Ids) {
				ContextNode *CurNode = getNodeForStackId(Id);
				// We should only have kept stack ids that had nodes.
				assert(CurNode);

				// Remove the context ids moved to NewNode from CurNode, and the
				// edge from the prior node.
				set_subtract(CurNode->ContextIds, NewNode->ContextIds);
				if (PrevNode) {
				auto *PrevEdge = CurNode->findEdgeFromCallee(PrevNode);
				assert(PrevEdge);
				set_subtract(PrevEdge->getContextIds(), NewNode->ContextIds);
				if (PrevEdge->getContextIds().empty()) {
				PrevNode->eraseCallerEdge(PrevEdge);
				CurNode->eraseCalleeEdge(PrevEdge);
				}
				}
				PrevNode = CurNode;
				}
				}
				}
				};

				// Actual post-order traversal.
				DenseSet<const ContextNode *> Visited;
				for (auto &Entry : AllocationCallToContextNodeMap)
				ProcessNode(Entry.second, Visited, ProcessNode);
				}

				uint64_t ModuleCallsiteContextGraph::getLastStackId(Instruction *Call) {
				CallStack<MDNode, MDNode::op_iterator> CallsiteContext(
				Call->getMetadata(LLVMContext::MD_callsite));
				return CallsiteContext.back();
				}

				uint64_t IndexCallsiteContextGraph::getLastStackId(IndexCall &Call) {
				assert(Call.is<CallsiteInfo *>());
				CallStack<CallsiteInfo, SmallVector<unsigned>::const_iterator>
				CallsiteContext(Call.dyn_cast<CallsiteInfo *>());
				// Need to convert index into stack id.
				return Index.getStackIdAtIndex(CallsiteContext.back());
				}

				static std::string getPGHOFuncName(Twine Base, unsigned CloneNo) {
				if (!CloneNo)
				return Base.str();
				return (Base + ".pgho." + std::to_string(CloneNo)).str();
				}

				std::string ModuleCallsiteContextGraph::getLabel(const Function *Func,
				const Instruction *Call,
				unsigned CloneNo) const {
				return (Call->getFunction()->getName() + " -\\> " +
				cast<CallBase>(Call)->getCalledFunction()->getName())
				.str();
				}

				std::string IndexCallsiteContextGraph::getLabel(const FunctionSummary *Func,
				const IndexCall &Call,
				unsigned CloneNo) const {
				auto VI = FSToVIMap.find(Func);
				assert(VI != FSToVIMap.end());
				if (Call.is<AllocInfo *>())
				return (VI->second.name() + " -\\> alloc").str();
				else {
				auto Callsite = Call.dyn_cast<CallsiteInfo >();
				return (VI->second.name() + " -\\> " +
				getPGHOFuncName(Callsite->Callee.name(), Callsite->Clones[CloneNo]))
				.str();
				}
				}

				std::vector<uint64_t>
				ModuleCallsiteContextGraph::getStackIdsWithContextNodesForCall(
				Instruction *Call) {
				CallStack<MDNode, MDNode::op_iterator> CallsiteContext(
				Call->getMetadata(LLVMContext::MD_callsite));
				return getStackIdsWithContextNodes<MDNode, MDNode::op_iterator>(
				CallsiteContext);
				}

				std::vector<uint64_t>
				IndexCallsiteContextGraph::getStackIdsWithContextNodesForCall(IndexCall &Call) {
				assert(Call.is<CallsiteInfo *>());
				CallStack<CallsiteInfo, SmallVector<unsigned>::const_iterator>
				CallsiteContext(Call.dyn_cast<CallsiteInfo *>());
				return getStackIdsWithContextNodes<CallsiteInfo,
				SmallVector<unsigned>::const_iterator>(
				CallsiteContext);
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				template <class NodeT, class IteratorT>
				std::vector<uint64_t>
				CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::getStackIdsWithContextNodes(
				CallStack<NodeT, IteratorT> &CallsiteContext) {
				std::vector<uint64_t> StackIds;
				for (auto IdOrIndex : CallsiteContext) {
				auto StackId = getStackId(IdOrIndex);
				ContextNode *Node = getNodeForStackId(StackId);
				if (!Node)
				break;
				StackIds.push_back(StackId);
				}
				return StackIds;
				}
				snehasishUnsubmitted Done Reply Inline Actions Twine(CloneNo) should incur fewer conversions. "pgho" suffix would change if you consider the comment about pgho/memprof above. snehasish: Twine(CloneNo) should incur fewer conversions. "pgho" suffix would change if you consider the…

				ModuleCallsiteContextGraph::ModuleCallsiteContextGraph(Module &M) : Mod(M) {
				for (auto &F : M) {
				for (auto &BB : F) {
				for (auto &I : BB) {
				if (auto *MemProfMD = I.getMetadata(LLVMContext::MD_memprof)) {
				FuncToCallsWithMetadata[&F].push_back(&I);
				auto *AllocNode = addAllocNode(&I);
				CallStack<MDNode, MDNode::op_iterator> CallsiteContext(
				I.getMetadata(LLVMContext::MD_callsite));
				// Add all of the MIBs and their stack nodes.
				for (auto &MDOp : MemProfMD->operands()) {
				auto *MIBMD = cast<const MDNode>(MDOp);
				MDNode *StackNode = getMIBStackNode(MIBMD);
				assert(StackNode);
				addStackNodesForMIB<MDNode, MDNode::op_iterator>(
				AllocNode, StackNode, CallsiteContext, getMIBAllocType(MIBMD));
				}
				assert(AllocNode->AllocTypes != (uint8_t)AllocationType::None);
				NodeToCallingFunc[AllocNode] = &F;
				// Memprof and callsite metadata on memory allocations no longer
				// needed.
				I.setMetadata(LLVMContext::MD_memprof, nullptr);
				I.setMetadata(LLVMContext::MD_callsite, nullptr);
				}
				// For callsite metadata, add to list for this function for later use.
				else if (I.getMetadata(LLVMContext::MD_callsite))
				FuncToCallsWithMetadata[&F].push_back(&I);
				}
				}
				}

				if (DumpCCG) {
				dbgs() << "CCG before updating call stack chains:\n";
				dbgs() << *this;
				}

				if (ExportToDot)
				exportToDot("ccg.prestackupdate.dot");

				updateStackNodes();

				// Strip off remaining callsite metadata, no longer needed.
				for (auto &FuncEntry : FuncToCallsWithMetadata)
				for (auto Call : FuncEntry.second)
				Call.call()->setMetadata(LLVMContext::MD_callsite, nullptr);

				handleCallsitesWithMultipleTargets();
				}

				IndexCallsiteContextGraph::IndexCallsiteContextGraph(
				ModuleSummaryIndex &Index,
				function_ref<bool(GlobalValue::GUID, const GlobalValueSummary *)>
				isPrevailing)
				: Index(Index) {
				for (auto &I : Index) {
				auto VI = Index.getValueInfo(I);
				for (auto &S : VI.getSummaryList()) {
				// We should only add the prevailing nodes. Otherwise we may try to clone
				// in a weak copy that won't be linked (and may be different than the
				// prevailing version).
				// We only keep the memprof summary on the prevailing copy now when
				// building the combined index, as a space optimization, however don't
				// rely on this optimization. The linker doesn't resolve local linkage
				// values so don't check whether those are prevailing.
				if (!GlobalValue::isLocalLinkage(S->linkage()) &&
				!isPrevailing(VI.getGUID(), S.get()))
				continue;
				auto *FS = dyn_cast<FunctionSummary>(S.get());
				if (!FS)
				continue;
				if (!FS->allocs().empty()) {
				for (auto &AN : FS->mutableAllocs()) {
				// This can happen because of recursion elimination handling that
				// currently exists in ModuleSummaryAnalysis. Skip these for now.
				// We still added them to the summary because we need to be able to
				// correlate properly in applyImport in the backends.
				if (AN.MIBs.empty())
				continue;
				FuncToCallsWithMetadata[FS].push_back({&AN});
				auto *AllocNode = addAllocNode({&AN});
				// Pass an empty CallStack to the CallsiteContext (second)
				// parameter, since for ThinLTO we already collapsed out the inlined
				// stack ids on the allocation call during ModuleSummaryAnalysis.
				CallStack<MIBInfo, SmallVector<unsigned>::const_iterator>
				EmptyContext;
				// Now add all of the MIBs and their stack nodes.
				for (auto &MIB : AN.MIBs)
				addStackNodesForMIB<MIBInfo, SmallVector<unsigned>::const_iterator>(
				AllocNode, &MIB, EmptyContext, MIB.AllocType);
				assert(AllocNode->AllocTypes != (uint8_t)AllocationType::None);
				// Initialize version 0 on the summary alloc node to the current alloc
				// type, unless it has both types in which case make it default, so
				// that in the case where we aren't able to clone the original version
				// always ends up with the default allocation behavior.
				AN.Versions[0] = (uint8_t)allocTypeToUse(AllocNode->AllocTypes);
				NodeToCallingFunc[AllocNode] = FS;
				}
				}
				// For callsite metadata, add to list for this function for later use.
				if (!FS->callsites().empty())
				for (auto &SN : FS->mutableCallsites())
				FuncToCallsWithMetadata[FS].push_back({&SN});

				if (!FS->allocs().empty() \|\| !FS->callsites().empty())
				FSToVIMap[FS] = VI;
				}
				}

				if (DumpCCG) {
				snehasishUnsubmitted Not Done Reply Inline Actions Can we split this patch to only implement the basic version (non-ThinLTO)? snehasish: Can we split this patch to only implement the basic version (non-ThinLTO)?
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions Will do for commit - as we discussed offline, at this point I may do so after the review which is already in progress here. tejohnson: Will do for commit - as we discussed offline, at this point I may do so after the review which…
				dbgs() << "CCG before updating call stack chains:\n";
				dbgs() << *this;
				}

				if (ExportToDot)
				exportToDot("ccg.prestackupdate.dot");

				updateStackNodes();

				handleCallsitesWithMultipleTargets();
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				void CallsiteContextGraph<DerivedCCG, FuncTy,
				CallTy>::handleCallsitesWithMultipleTargets() {
				// Look for and workaround callsites that call multiple functions.
				// This can happen for indirect calls, which needs better handling, and in
				// more rare cases (e.g. macro expansion).
				// TODO: To fix this for indirect calls we will want to perform speculative
				// devirtualization using either the normal PGO info with ICP, or using the
				// information in the profiled PGHO contexts. We can do this prior to
				// this transformation for regular LTO, and for ThinLTO we can simulate that
				// effect in the summary and perform the actual speculative devirtualization
				// while cloning in the ThinLTO backend.
				for (auto Entry = NonAllocationCallToContextNodeMap.begin();
				Entry != NonAllocationCallToContextNodeMap.end();) {
				auto *Node = Entry->second;
				assert(Node->Clones.empty());
				// Check all node callees and see if in the same function.
				bool Removed = false;
				auto Call = Node->Call.call();
				for (auto &Edge : Node->CalleeEdges) {
				if (!Edge->Callee->hasCall())
				continue;
				assert(NodeToCallingFunc.count(Edge->Callee));
				// Check if the called function matches that of the callee node.
				if (calleeMatchesFunc(Call, NodeToCallingFunc[Edge->Callee]))
				continue;
				// Work around by setting Node to have a null call, so it gets
				// skipped during cloning. Otherwise assignFunctions will assert
				// because its data structures are not designed to handle this case.
				Entry = NonAllocationCallToContextNodeMap.erase(Entry);
				Node->setCall(CallInfo());
				Removed = true;
				break;
				}
				if (!Removed)
				Entry++;
				}
				}

				uint64_t ModuleCallsiteContextGraph::getStackId(uint64_t IdOrIndex) const {
				// In the Module (IR) case this is already the Id.
				return IdOrIndex;
				}

				uint64_t IndexCallsiteContextGraph::getStackId(uint64_t IdOrIndex) const {
				// In the Index case this is an index into the stack id list in the summary
				// index, convert it to an Id.
				return Index.getStackIdAtIndex(IdOrIndex);
				}

				bool ModuleCallsiteContextGraph::calleeMatchesFunc(Instruction *Call,
				const Function *Func) {
				auto *CB = dyn_cast<CallBase>(Call);
				if (!CB->getCalledOperand())
				return false;
				auto CalleeVal = CB->getCalledOperand()->stripPointerCasts();
				auto *CalleeFunc = dyn_cast<Function>(CalleeVal);
				if (CalleeFunc == Func)
				return true;
				auto *Alias = dyn_cast<GlobalAlias>(CalleeVal);
				return Alias && Alias->getAliasee() == Func;
				}

				bool IndexCallsiteContextGraph::calleeMatchesFunc(IndexCall &Call,
				const FunctionSummary *Func) {
				ValueInfo Callee = Call.dyn_cast<CallsiteInfo *>()->Callee;
				// If there is no summary list then this is a call to an externally defined
				// symbol.
				AliasSummary *Alias =
				Callee.getSummaryList().empty()
				? nullptr
				: dyn_cast<AliasSummary>(Callee.getSummaryList()[0].get());
				assert(FSToVIMap.count(Func));
				return Callee == FSToVIMap[Func] \|\|
				// If callee is an alias, check the aliasee, since only function
				// summary base objects will contain the stack node summaries and thus
				// get a context node.
				(Alias && Alias->getAliaseeVI() == FSToVIMap[Func]);
				}

				static std::string getAllocTypeString(uint8_t AllocTypes) {
				if (!AllocTypes)
				return "None";
				std::string Str;
				if (AllocTypes & (uint8_t)AllocationType::NotCold)
				Str += "NotCold";
				if (AllocTypes & (uint8_t)AllocationType::Cold)
				Str += "Cold";
				return Str;
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::ContextNode::dump()
				const {
				print(dbgs());
				dbgs() << "\n";
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::ContextNode::print(
				raw_ostream &OS) const {
				OS << "Node " << this << "\n";
				OS << "\t";
				printCall(OS);
				if (Recursive)
				OS << " (recursive)";
				OS << "\n";
				OS << "\tAllocTypes: " << getAllocTypeString(AllocTypes) << "\n";
				OS << "\tContextIds:";
				std::vector<uint32_t> SortedIds(ContextIds.begin(), ContextIds.end());
				std::sort(SortedIds.begin(), SortedIds.end());
				for (auto Id : SortedIds)
				OS << " " << Id;
				OS << "\n";
				OS << "\tCalleeEdges:\n";
				for (auto &Edge : CalleeEdges)
				OS << "\t\t" << *Edge << "\n";
				OS << "\tCallerEdges:\n";
				for (auto &Edge : CallerEdges)
				OS << "\t\t" << *Edge << "\n";
				if (!Clones.empty()) {
				OS << "\tClones: ";
				FieldSeparator FS;
				for (auto *Clone : Clones)
				OS << FS << Clone;
				OS << "\n";
				} else if (CloneOf) {
				OS << "\tClone of " << CloneOf << "\n";
				}
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::ContextEdge::dump()
				const {
				print(dbgs());
				dbgs() << "\n";
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::ContextEdge::print(
				raw_ostream &OS) const {
				OS << "Edge from Callee " << Callee << " to Caller: " << Caller
				<< " AllocTypes: " << getAllocTypeString(AllocTypes);
				OS << " ContextIds:";
				std::vector<uint32_t> SortedIds(ContextIds.begin(), ContextIds.end());
				std::sort(SortedIds.begin(), SortedIds.end());
				for (auto Id : SortedIds)
				OS << " " << Id;
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				static void printNodesRecursively(
				raw_ostream &OS, const ContextNode<DerivedCCG, FuncTy, CallTy> *Node,
				DenseSet<const ContextNode<DerivedCCG, FuncTy, CallTy> *> &Visited) {
				auto Inserted = Visited.insert(Node);
				if (!Inserted.second)
				return;
				Node->print(OS);
				OS << "\n";
				for (auto &Edge : Node->CallerEdges)
				printNodesRecursively<DerivedCCG, FuncTy, CallTy>(OS, Edge->Caller,
				Visited);
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::dump() const {
				print(dbgs());
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::print(
				raw_ostream &OS) const {
				OS << "Callsite Context Graph:\n";
				DenseSet<const ContextNode *> Visited;
				for (auto &Entry : AllocationCallToContextNodeMap) {
				printNodesRecursively<DerivedCCG, FuncTy, CallTy>(OS, Entry.second,
				Visited);
				for (auto *Clone : Entry.second->Clones)
				printNodesRecursively<DerivedCCG, FuncTy, CallTy>(OS, Clone, Visited);
				}
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				static void checkEdge(
				const std::shared_ptr<ContextEdge<DerivedCCG, FuncTy, CallTy>> &Edge) {
				// Confirm that alloc type is not None and that we have at least one context
				// id.
				assert(Edge->AllocTypes != (uint8_t)AllocationType::None);
				assert(!Edge->ContextIds.empty());
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				static void checkNode(const ContextNode<DerivedCCG, FuncTy, CallTy> *Node,
				bool CheckEdges = false) {
				// Node's context ids should be the union of both its callee and caller edge
				// context ids.
				if (Node->CallerEdges.size()) {
				auto EI = Node->CallerEdges.begin();
				auto &FirstEdge = *EI;
				DenseSet<uint32_t> CallerEdgeContextIds(FirstEdge->ContextIds);
				for (EI++; EI != Node->CallerEdges.end(); EI++) {
				const auto &Edge = *EI;
				if (CheckEdges)
				checkEdge<DerivedCCG, FuncTy, CallTy>(Edge);
				set_union(CallerEdgeContextIds, Edge->ContextIds);
				}
				// Node can have more context ids than callers if some contexts terminate at
				// node and some are longer.
				assert(Node->ContextIds == CallerEdgeContextIds \|\|
				set_is_subset(CallerEdgeContextIds, Node->ContextIds));
				}
				if (Node->CalleeEdges.size()) {
				auto EI = Node->CalleeEdges.begin();
				auto FirstEdge = *EI;
				DenseSet<uint32_t> CalleeEdgeContextIds(FirstEdge->ContextIds);
				for (EI++; EI != Node->CalleeEdges.end(); EI++) {
				const auto &Edge = *EI;
				if (CheckEdges)
				checkEdge<DerivedCCG, FuncTy, CallTy>(Edge);
				set_union(CalleeEdgeContextIds, Edge->ContextIds);
				}
				assert(Node->ContextIds == CalleeEdgeContextIds);
				}
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				static void checkNodesRecursively(
				const ContextNode<DerivedCCG, FuncTy, CallTy> *Node,
				DenseSet<const ContextNode<DerivedCCG, FuncTy, CallTy> *> &Visited) {
				auto Inserted = Visited.insert(Node);
				if (!Inserted.second)
				return;
				checkNode<DerivedCCG, FuncTy, CallTy>(Node);
				for (auto &Edge : Node->CallerEdges) {
				// Separate edge and node checking so we don't check edges twice.
				checkEdge<DerivedCCG, FuncTy, CallTy>(Edge);
				checkNodesRecursively<DerivedCCG, FuncTy, CallTy>(Edge->Caller, Visited);
				}
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::check() const {
				DenseSet<const ContextNode *> Visited;
				for (auto &Entry : AllocationCallToContextNodeMap) {
				checkNodesRecursively<DerivedCCG, FuncTy, CallTy>(Entry.second, Visited);
				for (auto *Clone : Entry.second->Clones)
				checkNodesRecursively<DerivedCCG, FuncTy, CallTy>(Clone, Visited);
				}
				}

				namespace {
				struct Attributes {
				void add(const Twine &Name, const Twine &Value,
				const Twine &Comment = Twine());
				void addComment(const Twine &Comment);
				std::string getAsString() const;

				std::vector<std::string> Attrs;
				std::string Comments;
				};
				} // namespace

				void Attributes::add(const Twine &Name, const Twine &Value,
				const Twine &Comment) {
				std::string A = Name.str();
				A += "=\"";
				A += Value.str();
				A += "\"";
				Attrs.push_back(A);
				addComment(Comment);
				}

				void Attributes::addComment(const Twine &Comment) {
				if (!Comment.isTriviallyEmpty()) {
				if (Comments.empty())
				Comments = " // ";
				snehasishUnsubmitted Done Reply Inline Actions Prefer incrementing this iterator outside the for loop. It was a little strange to see EI++ in the initialization and increment step. snehasish: Prefer incrementing this iterator outside the for loop. It was a little strange to see EI++ in…
				else
				Comments += ", ";
				Comments += Comment.str();
				}
				}

				std::string Attributes::getAsString() const {
				if (Attrs.empty())
				return "";

				std::string Ret = "[";
				for (auto &A : Attrs)
				Ret += A + ",";
				Ret.pop_back();
				Ret += "];";
				Ret += Comments;
				return Ret;
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::exportToDot(
				std::string Path) const {
				std::error_code EC;
				raw_fd_ostream OS(DotFilePathPrefix + Path, EC, sys::fs::OpenFlags::OF_None);
				if (EC) {
				errs() << "failed to open " << DotFilePathPrefix + Path << ": "
				<< EC.message() << '\n';
				errs().flush();
				exit(1);
				}

				auto NodeId = [](const ContextNode *Node) {
				std::stringstream sstream;
				sstream << std::hex << "N0x" << (unsigned long long)Node;
				std::string result = sstream.str();
				return result;
				};

				auto GetContextIds = [](const DenseSet<uint32_t> &ContextIds) {
				std::string IdString = " ContextIds:";
				if (ContextIds.size() < 100) {
				std::vector<uint32_t> SortedIds(ContextIds.begin(), ContextIds.end());
				std::sort(SortedIds.begin(), SortedIds.end());
				for (auto Id : SortedIds)
				IdString += " " + std::to_string(Id);
				} else {
				IdString += " (" + std::to_string(ContextIds.size()) + " ids)";
				}
				return IdString;
				};

				auto AddColorAttribute = [](uint8_t AllocTypes, Attributes &A) {
				if (AllocTypes == (uint8_t)AllocationType::NotCold)
				// Color "brown1" actually looks like a lighter red.
				A.add("fillcolor", "brown1", "default");
				else if (AllocTypes == (uint8_t)AllocationType::Cold)
				A.add("fillcolor", "cyan", "cold");
				else if (AllocTypes ==
				((uint8_t)AllocationType::NotCold \| (uint8_t)AllocationType::Cold))
				// Lighter purple.
				A.add("fillcolor", "mediumorchid1", "default\|cold");
				else
				A.add("fillcolor", "gray", "none");
				};

				auto DrawEdge = [&](const std::shared_ptr<ContextEdge> &Edge) {
				Attributes A;
				A.add("tooltip", GetContextIds(Edge->ContextIds));
				AddColorAttribute(Edge->AllocTypes, A);

				OS << " " << NodeId(Edge->Caller) << " -> " << NodeId(Edge->Callee)
				<< A.getAsString() << "\n";
				};

				auto NodeLabel = [&](const ContextNode *Node) {
				std::string LabelString =
				(Twine("OrigId: ") + (Node->IsAllocation ? "Alloc" : "") +
				std::to_string(Node->OrigStackOrAllocId))
				.str();
				LabelString += "\\n";
				if (Node->hasCall()) {
				auto Func = NodeToCallingFunc.find(Node);
				assert(Func != NodeToCallingFunc.end());
				LabelString +=
				getLabel(Func->second, Node->Call.call(), Node->Call.cloneNo());
				} else {
				LabelString += "null call";
				if (Node->Recursive)
				LabelString += " (recursive)";
				else
				LabelString += " (external)";
				}
				return LabelString;
				};

				OS << "digraph CallsiteContextGraph {\n";
				snehasishUnsubmitted Done Reply Inline Actions It would be good to use llvm/Support/GraphWriter.h to export as dotGraph instead of re-implementing things here. You may need to use DOTGraphTraits.h too to implement some things such as tooltips which are used here. This is probably a significant change but it would reduce the amount of code we have to maintain in this file. snehasish: It would be good to use llvm/Support/GraphWriter.h to export as dotGraph instead of re…
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions Thanks for that pointer, will do. I copied the approach used by ModuleSummaryIndex, which seems like it should eventually be migrated to using the GraphWriter as well! tejohnson: Thanks for that pointer, will do. I copied the approach used by ModuleSummaryIndex, which seems…

				DenseSet<const ContextNode *> Visited;
				std::vector<const ContextNode *> Worklist;
				for (auto &Entry : AllocationCallToContextNodeMap) {
				Worklist.push_back(Entry.second);
				for (auto *Clone : Entry.second->Clones)
				Worklist.push_back(Clone);
				}
				while (!Worklist.empty()) {
				const ContextNode *Node = Worklist.back();
				Worklist.pop_back();
				if (!Visited.insert(Node).second)
				continue;
				for (const auto &Edge : Node->CallerEdges)
				Worklist.push_back(Edge->Caller);

				Attributes A;
				A.add("shape", "record", "callsite");

				A.add("label", NodeLabel(Node));
				std::string TooltipString = NodeId(Node);
				TooltipString += GetContextIds(Node->ContextIds);
				A.add("tooltip", TooltipString);
				AddColorAttribute(Node->AllocTypes, A);
				A.add("style", "filled");
				if (Node->CloneOf) {
				A.add("color", "blue");
				A.add("style", "filled, bold, dashed");
				} else
				A.add("style", "filled");

				OS << " " << NodeId(Node) << " " << A.getAsString() << "\n";
				}

				OS << " // Edges:\n";

				Visited.clear();
				Worklist.clear();
				for (auto &Entry : AllocationCallToContextNodeMap) {
				Worklist.push_back(Entry.second);
				for (auto *Clone : Entry.second->Clones)
				Worklist.push_back(Clone);
				}
				while (!Worklist.empty()) {
				const ContextNode *Node = Worklist.back();
				Worklist.pop_back();
				if (!Visited.insert(Node).second)
				continue;

				for (const auto &Edge : Node->CallerEdges) {
				Worklist.push_back(Edge->Caller);

				DrawEdge(Edge);
				}
				}

				OS << "}";
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				ContextNode<DerivedCCG, FuncTy, CallTy> *
				CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::moveEdgeToNewCalleeClone(
				const std::shared_ptr<ContextEdge> &Edge, EdgeIter *CallerEdgeI) {
				ContextNode *Node = Edge->Callee;
				ContextNode *Clone = Node->clone();
				assert(NodeToCallingFunc.count(Node));
				NodeToCallingFunc[Clone] = NodeToCallingFunc[Node];
				moveEdgeToExistingCalleeClone(Edge, Clone, CallerEdgeI, /NewClone/ true);
				return Clone;
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::
				moveEdgeToExistingCalleeClone(const std::shared_ptr<ContextEdge> &Edge,
				ContextNode NewCallee, EdgeIter CallerEdgeI,
				bool NewClone) {
				// NewCallee and Edge's current callee must be clones of the same original
				// node (Edge's current callee may be the original node too).
				assert(NewCallee->getOrigNode() == Edge->Callee->getOrigNode());
				auto &EdgeContextIds = Edge->getContextIds();
				ContextNode *OldCallee = Edge->Callee;
				if (CallerEdgeI)
				CallerEdgeI = OldCallee->CallerEdges.erase(CallerEdgeI);
				else
				OldCallee->eraseCallerEdge(Edge.get());
				Edge->Callee = NewCallee;
				NewCallee->CallerEdges.push_back(Edge);
				// Don't need to update Edge's context ids since we are simply reconnecting
				// it.
				set_subtract(OldCallee->ContextIds, EdgeContextIds);
				NewCallee->ContextIds.insert(EdgeContextIds.begin(), EdgeContextIds.end());
				NewCallee->AllocTypes \|= Edge->AllocTypes;
				OldCallee->AllocTypes = computeAllocType(OldCallee->ContextIds);
				// OldCallee alloc type should be None iff its context id set is now empty.
				assert((OldCallee->AllocTypes == (uint8_t)AllocationType::None) ==
				OldCallee->ContextIds.empty());
				// Now walk the old callee node's callee edges and move Edge's context ids
				// over to the corresponding edge into the clone (which is created here if
				// this is a newly created clone).
				for (auto &OldCalleeEdge : OldCallee->CalleeEdges) {
				// The context ids moving to the new callee are the subset of this edge's
				// context ids and the context ids on the caller edge being moved.
				DenseSet<uint32_t> EdgeContextIdsToMove =
				set_intersection(OldCalleeEdge->getContextIds(), EdgeContextIds);
				set_subtract(OldCalleeEdge->getContextIds(), EdgeContextIdsToMove);
				OldCalleeEdge->AllocTypes =
				computeAllocType(OldCalleeEdge->getContextIds());
				if (!NewClone) {
				// Update context ids / alloc type on corresponding edge to NewCallee.
				// There is a chance this may not exist if we are reusing an existing
				// clone, specifically during function assignment, where we would have
				// removed none type edges after creating the clone. If we can't find
				// a corresponding edge there, fall through to the cloning below.
				if (auto *NewCalleeEdge =
				NewCallee->findEdgeFromCallee(OldCalleeEdge->Callee)) {
				NewCalleeEdge->getContextIds().insert(EdgeContextIdsToMove.begin(),
				EdgeContextIdsToMove.end());
				NewCalleeEdge->AllocTypes \|= computeAllocType(EdgeContextIdsToMove);
				continue;
				}
				}
				auto NewEdge = std::make_shared<ContextEdge>(
				OldCalleeEdge->Callee, NewCallee,
				computeAllocType(EdgeContextIdsToMove), EdgeContextIdsToMove);
				NewCallee->CalleeEdges.push_back(NewEdge);
				NewEdge->Callee->CallerEdges.push_back(NewEdge);
				}
				if (VerifyCCG) {
				checkNode<DerivedCCG, FuncTy, CallTy>(OldCallee);
				checkNode<DerivedCCG, FuncTy, CallTy>(NewCallee);
				for (const auto &Edge : OldCallee->CalleeEdges)
				checkNode<DerivedCCG, FuncTy, CallTy>(Edge->Callee);
				for (const auto &Edge : NewCallee->CalleeEdges)
				checkNode<DerivedCCG, FuncTy, CallTy>(Edge->Callee);
				}
				}

				template <typename DerivedCCG, typename FuncTy, typename CallTy>
				bool CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::process() {
				if (DumpCCG) {
				dbgs() << "CCG before cloning:\n";
				dbgs() << *this;
				}
				if (ExportToDot)
				exportToDot("ccg.postbuild.dot");

				if (VerifyCCG) {
				check();
				}

				return false;
				}

				bool PGHOContextDisambiguation::processModule(
				Module &M) {
				bool Changed = false;

				ModuleCallsiteContextGraph CCG(M);
				Changed = CCG.process();

				return Changed;
				}

				PreservedAnalyses PGHOContextDisambiguation::run(Module &M,
				ModuleAnalysisManager &AM) {
				if (!processModule(M))
				return PreservedAnalyses::all();
				return PreservedAnalyses::none();
				}

				void PGHOContextDisambiguation::run(
				ModuleSummaryIndex &Index,
				function_ref<bool(GlobalValue::GUID, const GlobalValueSummary *)>
				isPrevailing) {
				IndexCallsiteContextGraph CCG(Index, isPrevailing);
				CCG.process();
				}

llvm/test/ThinLTO/X86/pgho-basic.ll

This file was added.

				;; Test callsite context graph generation for simple call graph with
				;; two memprof contexts and no inlining.
				;;
				;; Original code looks like:
				;;
				;; char *bar() {
				;; return new char[10];
				;; }
				;;
				;; char *baz() {
				;; return bar();
				;; }
				;;
				;; char *foo() {
				;; return baz();
				;; }
				;;
				;; int main(int argc, char **argv) {
				;; char *x = foo();
				;; char *y = foo();
				;; memset(x, 0, 10);
				;; memset(y, 0, 10);
				;; delete[] x;
				;; sleep(10);
				;; delete[] y;
				;; return 0;
				;; }
				;;
				;; Code compiled with -mllvm -memprof-min-lifetime-cold-threshold=5 so that the
				;; memory freed after sleep(10) results in cold lifetimes.

				; RUN: opt -thinlto-bc %s >%t.o
				; RUN: llvm-lto2 run %t.o -enable-pgho-context-disambiguation \
				; RUN: -r=%t.o,main,plx \
				; RUN: -r=%t.o,_ZdaPv, \
				; RUN: -r=%t.o,sleep, \
				; RUN: -r=%t.o,_Znam, \
				; RUN: -pgho-verify-ccg -pgho-verify-nodes -pgho-dump-ccg \
				; RUN: -pgho-export-to-dot -pgho-dot-file-path-prefix=%t. \
				; RUN: -o %t.out 2>&1 \| FileCheck %s --check-prefix=DUMP

				; RUN: cat %t.ccg.postbuild.dot \| FileCheck %s --check-prefix=DOT

				; ModuleID = 'pgho-basic.ll'
				source_filename = "pgho-basic.ll"
				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: mustprogress noinline norecurse optnone uwtable
				define dso_local noundef i32 @main(i32 noundef %argc, ptr noundef %argv) #0 {
				entry:
				%retval = alloca i32, align 4
				%argc.addr = alloca i32, align 4
				%argv.addr = alloca ptr, align 8
				%x = alloca ptr, align 8
				%y = alloca ptr, align 8
				store i32 0, ptr %retval, align 4
				store i32 %argc, ptr %argc.addr, align 4
				store ptr %argv, ptr %argv.addr, align 8
				%call = call noundef ptr @_Z3foov(), !callsite !7
				store ptr %call, ptr %x, align 8
				%call1 = call noundef ptr @_Z3foov(), !callsite !8
				store ptr %call1, ptr %y, align 8
				%0 = load ptr, ptr %x, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %0, i8 0, i64 10, i1 false)
				%1 = load ptr, ptr %y, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %1, i8 0, i64 10, i1 false)
				%2 = load ptr, ptr %x, align 8
				%isnull = icmp eq ptr %2, null
				br i1 %isnull, label %delete.end, label %delete.notnull

				delete.notnull: ; preds = %entry
				call void @_ZdaPv(ptr noundef %2) #6
				br label %delete.end

				delete.end: ; preds = %delete.notnull, %entry
				%call2 = call i32 @sleep(i32 noundef 10)
				%3 = load ptr, ptr %y, align 8
				%isnull3 = icmp eq ptr %3, null
				br i1 %isnull3, label %delete.end5, label %delete.notnull4

				delete.notnull4: ; preds = %delete.end
				call void @_ZdaPv(ptr noundef %3) #6
				br label %delete.end5

				delete.end5: ; preds = %delete.notnull4, %delete.end
				ret i32 0
				}

				; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
				declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #1

				; Function Attrs: nobuiltin nounwind
				declare void @_ZdaPv(ptr noundef) #2

				declare i32 @sleep(i32 noundef) #3

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z3barv() #4 {
				entry:
				%call = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #7, !memprof !9, !callsite !14
				ret ptr %call
				}

				; Function Attrs: nobuiltin allocsize(0)
				declare noundef nonnull ptr @_Znam(i64 noundef) #5

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z3bazv() #4 {
				entry:
				%call = call noundef ptr @_Z3barv(), !callsite !15
				ret ptr %call
				}

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z3foov() #4 {
				entry:
				%call = call noundef ptr @_Z3bazv(), !callsite !16
				ret ptr %call
				}

				attributes #0 = { mustprogress noinline norecurse optnone uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #1 = { nocallback nofree nounwind willreturn memory(argmem: write) }
				attributes #2 = { nobuiltin nounwind "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #3 = { "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #4 = { mustprogress noinline optnone uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #5 = { nobuiltin allocsize(0) "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #6 = { builtin nounwind }
				attributes #7 = { builtin allocsize(0) }

				!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6}

				!0 = !{i32 7, !"Dwarf Version", i32 5}
				!1 = !{i32 2, !"Debug Info Version", i32 3}
				!2 = !{i32 1, !"wchar_size", i32 4}
				!3 = !{i32 8, !"PIC Level", i32 2}
				!4 = !{i32 7, !"PIE Level", i32 2}
				!5 = !{i32 7, !"uwtable", i32 2}
				!6 = !{i32 7, !"frame-pointer", i32 2}
				!7 = !{i64 8632435727821051414}
				!8 = !{i64 -3421689549917153178}
				!9 = !{!10, !12}
				!10 = !{!11, !"notcold"}
				!11 = !{i64 9086428284934609951, i64 -5964873800580613432, i64 2732490490862098848, i64 8632435727821051414}
				!12 = !{!13, !"cold"}
				!13 = !{i64 9086428284934609951, i64 -5964873800580613432, i64 2732490490862098848, i64 -3421689549917153178}
				!14 = !{i64 9086428284934609951}
				!15 = !{i64 -5964873800580613432}
				!16 = !{i64 2732490490862098848}

				; DUMP: CCG before cloning:
				; DUMP: Callsite Context Graph:
				; DUMP: Node [[BAR:0x[a-z0-9]+]]
				; DUMP: Versions: 1 MIB:
				; DUMP: AllocType 1 StackIds: 2, 3, 0
				; DUMP: AllocType 2 StackIds: 2, 3, 1
				; DUMP: (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 1 2
				; DUMP: CalleeEdges:
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[BAR]] to Caller: [[BAZ:0x[a-z0-9]+]] AllocTypes: NotColdCold ContextIds: 1 2

				; DUMP: Node [[BAZ]]
				; DUMP: Callee: 10756268697391741933 (_Z3barv) Clones: 0 StackIds: 2 (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 1 2
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[BAR]] to Caller: [[BAZ]] AllocTypes: NotColdCold ContextIds: 1 2
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[BAZ]] to Caller: [[FOO:0x[a-z0-9]+]] AllocTypes: NotColdCold ContextIds: 1 2

				; DUMP: Node [[FOO]]
				; DUMP: Callee: 17547784407117670007 (_Z3bazv) Clones: 0 StackIds: 3 (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 1 2
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[BAZ]] to Caller: [[FOO]] AllocTypes: NotColdCold ContextIds: 1 2
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN1:0x[a-z0-9]+]] AllocTypes: NotCold ContextIds: 1
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN2:0x[a-z0-9]+]] AllocTypes: Cold ContextIds: 2

				; DUMP: Node [[MAIN1]]
				; DUMP: Callee: 6988045695824228603 (_Z3foov) Clones: 0 StackIds: 0 (clone 0)
				; DUMP: AllocTypes: NotCold
				; DUMP: ContextIds: 1
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN1]] AllocTypes: NotCold ContextIds: 1
				; DUMP: CallerEdges:

				; DUMP: Node [[MAIN2]]
				; DUMP: Callee: 6988045695824228603 (_Z3foov) Clones: 0 StackIds: 1 (clone 0)
				; DUMP: AllocTypes: Cold
				; DUMP: ContextIds: 2
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN2]] AllocTypes: Cold ContextIds: 2
				; DUMP: CallerEdges:


				; DOT: digraph CallsiteContextGraph {
				; DOT: N[[BAR:0x[a-z0-9]+]] [shape="record",label="OrigId: Alloc0\n_Z3barv -\> alloc",tooltip="N[[BAR]] ContextIds: 1 2",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOT: N[[BAZ:0x[a-z0-9]+]] [shape="record",label="OrigId: 12481870273128938184\n_Z3bazv -\> _Z3barv",tooltip="N[[BAZ]] ContextIds: 1 2",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOT: N[[FOO:0x[a-z0-9]+]] [shape="record",label="OrigId: 2732490490862098848\n_Z3foov -\> _Z3bazv",tooltip="N[[FOO]] ContextIds: 1 2",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOT: N[[MAIN1:0x[a-z0-9]+]] [shape="record",label="OrigId: 15025054523792398438\nmain -\> _Z3foov",tooltip="N[[MAIN1]] ContextIds: 2",fillcolor="cyan",style="filled",style="filled"]; // callsite, cold
				; DOT: N[[MAIN2:0x[a-z0-9]+]] [shape="record",label="OrigId: 8632435727821051414\nmain -\> _Z3foov",tooltip="N[[MAIN2]] ContextIds: 1",fillcolor="brown1",style="filled",style="filled"]; // callsite, default
				; DOT: // Edges:
				; DOT: N[[BAZ]] -> N[[BAR]][tooltip=" ContextIds: 1 2",fillcolor="mediumorchid1"]; // default\|cold
				; DOT: N[[FOO]] -> N[[BAZ]][tooltip=" ContextIds: 1 2",fillcolor="mediumorchid1"]; // default\|cold
				; DOT: N[[MAIN2]] -> N[[FOO]][tooltip=" ContextIds: 1",fillcolor="brown1"]; // default
				; DOT: N[[MAIN1]] -> N[[FOO]][tooltip=" ContextIds: 2",fillcolor="cyan"]; // cold
				; DOT: }

llvm/test/ThinLTO/X86/pgho-duplicate-context-ids.ll

This file was added.

				;; Test callsite context graph generation for call graph with with MIBs
				;; that have pruned contexts that partially match multiple inlined
				;; callsite contexts, requiring duplication of context ids and nodes
				;; while matching callsite nodes onto the graph.
				;;
				;; Original code looks like:
				;;
				;; char *D() {
				;; return new char[10];
				;; }
				;;
				;; char *F() {
				;; return D();
				;; }
				;;
				;; char *C() {
				;; return D();
				;; }
				;;
				;; char *B() {
				;; return C();
				;; }
				;;
				;; char *E() {
				;; return C();
				;; }
				;; int main(int argc, char **argv) {
				;; char *x = B(); // cold
				;; char *y = E(); // cold
				;; char *z = F(); // default
				;; memset(x, 0, 10);
				;; memset(y, 0, 10);
				;; memset(z, 0, 10);
				;; delete[] z;
				;; sleep(10);
				;; delete[] x;
				;; delete[] y;
				;; return 0;
				;; }
				;;
				;; Code compiled with -mllvm -memprof-min-lifetime-cold-threshold=5 so that the
				;; memory freed after sleep(10) results in cold lifetimes.
				;;
				;; The code below was created by forcing inlining of C into both B and E.
				;; Since both allocation contexts via C are cold, the matched memprof
				;; metadata has the context pruned above C's callsite. This requires
				;; matching the stack node for C to callsites where it was inlined (i.e.
				;; the callsites in B and E that have callsite metadata that includes C's).
				;; It also requires duplication of that node in the graph as well as the
				;; duplication of the context ids along that path through the graph,
				;; so that we can represent the duplicated (via inlining) C callsite.

				; RUN: opt -thinlto-bc %s >%t.o
				; RUN: llvm-lto2 run %t.o -enable-pgho-context-disambiguation \
				; RUN: -r=%t.o,main,plx \
				; RUN: -r=%t.o,_ZdaPv, \
				; RUN: -r=%t.o,sleep, \
				; RUN: -r=%t.o,_Znam, \
				; RUN: -pgho-verify-ccg -pgho-verify-nodes -pgho-dump-ccg \
				; RUN: -pgho-export-to-dot -pgho-dot-file-path-prefix=%t. \
				; RUN: -o %t.out 2>&1 \| FileCheck %s --check-prefix=DUMP

				; RUN: cat %t.ccg.prestackupdate.dot \| FileCheck %s --check-prefix=DOTPRE
				; RUN: cat %t.ccg.postbuild.dot \| FileCheck %s --check-prefix=DOTPOST

				; ModuleID = 'duplicate-context-ids.ll'
				source_filename = "duplicate-context-ids.ll"
				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z1Dv() #0 {
				entry:
				%call = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #6, !memprof !7, !callsite !12
				ret ptr %call
				}

				; Function Attrs: nobuiltin allocsize(0)
				declare noundef nonnull ptr @_Znam(i64 noundef) #1

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z1Fv() #0 {
				entry:
				%call = call noundef ptr @_Z1Dv(), !callsite !13
				ret ptr %call
				}

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z1Cv() #0 {
				entry:
				%call = call noundef ptr @_Z1Dv(), !callsite !14
				ret ptr %call
				}

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z1Bv() #0 {
				entry:
				%call.i = call noundef ptr @_Z1Dv(), !callsite !15
				ret ptr %call.i
				}

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z1Ev() #0 {
				entry:
				%call.i = call noundef ptr @_Z1Dv(), !callsite !16
				ret ptr %call.i
				}

				; Function Attrs: mustprogress noinline norecurse optnone uwtable
				define dso_local noundef i32 @main(i32 noundef %argc, ptr noundef %argv) #2 {
				entry:
				%retval = alloca i32, align 4
				%argc.addr = alloca i32, align 4
				%argv.addr = alloca ptr, align 8
				%x = alloca ptr, align 8
				%y = alloca ptr, align 8
				%z = alloca ptr, align 8
				store i32 0, ptr %retval, align 4
				store i32 %argc, ptr %argc.addr, align 4
				store ptr %argv, ptr %argv.addr, align 8
				%call = call noundef ptr @_Z1Bv(), !callsite !17
				store ptr %call, ptr %x, align 8
				%call1 = call noundef ptr @_Z1Ev(), !callsite !18
				store ptr %call1, ptr %y, align 8
				%call2 = call noundef ptr @_Z1Fv(), !callsite !19
				store ptr %call2, ptr %z, align 8
				%0 = load ptr, ptr %x, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %0, i8 0, i64 10, i1 false)
				%1 = load ptr, ptr %y, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %1, i8 0, i64 10, i1 false)
				%2 = load ptr, ptr %z, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %2, i8 0, i64 10, i1 false)
				%3 = load ptr, ptr %z, align 8
				%isnull = icmp eq ptr %3, null
				br i1 %isnull, label %delete.end, label %delete.notnull

				delete.notnull: ; preds = %entry
				call void @_ZdaPv(ptr noundef %3) #7
				br label %delete.end

				delete.end: ; preds = %delete.notnull, %entry
				%call3 = call i32 @sleep(i32 noundef 10)
				%4 = load ptr, ptr %x, align 8
				%isnull4 = icmp eq ptr %4, null
				br i1 %isnull4, label %delete.end6, label %delete.notnull5

				delete.notnull5: ; preds = %delete.end
				call void @_ZdaPv(ptr noundef %4) #7
				br label %delete.end6

				delete.end6: ; preds = %delete.notnull5, %delete.end
				%5 = load ptr, ptr %y, align 8
				%isnull7 = icmp eq ptr %5, null
				br i1 %isnull7, label %delete.end9, label %delete.notnull8

				delete.notnull8: ; preds = %delete.end6
				call void @_ZdaPv(ptr noundef %5) #7
				br label %delete.end9

				delete.end9: ; preds = %delete.notnull8, %delete.end6
				ret i32 0
				}

				; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
				declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #3

				; Function Attrs: nobuiltin nounwind
				declare void @_ZdaPv(ptr noundef) #4

				declare i32 @sleep(i32 noundef) #5

				attributes #0 = { mustprogress noinline optnone uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #1 = { nobuiltin allocsize(0) "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #2 = { mustprogress noinline norecurse optnone uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #3 = { nocallback nofree nounwind willreturn memory(argmem: write) }
				attributes #4 = { nobuiltin nounwind "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #5 = { "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #6 = { builtin allocsize(0) }
				attributes #7 = { builtin nounwind }

				!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6}

				!0 = !{i32 7, !"Dwarf Version", i32 5}
				!1 = !{i32 2, !"Debug Info Version", i32 3}
				!2 = !{i32 1, !"wchar_size", i32 4}
				!3 = !{i32 8, !"PIC Level", i32 2}
				!4 = !{i32 7, !"PIE Level", i32 2}
				!5 = !{i32 7, !"uwtable", i32 2}
				!6 = !{i32 7, !"frame-pointer", i32 2}
				!7 = !{!8, !10}
				!8 = !{!9, !"cold"}
				!9 = !{i64 6541423618768552252, i64 -6270142974039008131}
				!10 = !{!11, !"notcold"}
				!11 = !{i64 6541423618768552252, i64 -4903163940066524832}
				!12 = !{i64 6541423618768552252}
				!13 = !{i64 -4903163940066524832}
				!14 = !{i64 -6270142974039008131}
				!15 = !{i64 -6270142974039008131, i64 -184525619819294889}
				!16 = !{i64 -6270142974039008131, i64 1905834578520680781}
				!17 = !{i64 8632435727821051414}
				!18 = !{i64 -3421689549917153178}
				!19 = !{i64 6307901912192269588}


				;; After adding only the alloc node memprof metadata, we only have 2 contexts.

				; DUMP: CCG before updating call stack chains:
				; DUMP: Callsite Context Graph:
				; DUMP: Node [[D:0x[a-z0-9]+]]
				; DUMP: Versions: 1 MIB:
				; DUMP: AllocType 2 StackIds: 0
				; DUMP: AllocType 1 StackIds: 1
				; DUMP: (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 1 2
				; DUMP: CalleeEdges:
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[D]] to Caller: [[C:0x[a-z0-9]+]] AllocTypes: Cold ContextIds: 1
				; DUMP: Edge from Callee [[D]] to Caller: [[F:0x[a-z0-9]+]] AllocTypes: NotCold ContextIds: 2

				; DUMP: Node [[C]]
				; DUMP: null Call
				; DUMP: AllocTypes: Cold
				; DUMP: ContextIds: 1
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[D]] to Caller: [[C]] AllocTypes: Cold ContextIds: 1
				; DUMP: CallerEdges:

				; DUMP: Node [[F]]
				; DUMP: null Call
				; DUMP: AllocTypes: NotCold
				; DUMP: ContextIds: 2
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[D]] to Caller: [[F]] AllocTypes: NotCold ContextIds: 2
				; DUMP: CallerEdges:

				;; After updating for callsite metadata, we should have generated context ids 3 and 4,
				;; along with 2 new nodes for those callsites. All have the same allocation type
				;; behavior as the original C node.

				; DUMP: CCG before cloning:
				; DUMP: Callsite Context Graph:
				; DUMP: Node [[D]]
				; DUMP: Versions: 1 MIB:
				; DUMP: AllocType 2 StackIds: 0
				; DUMP: AllocType 1 StackIds: 1
				; DUMP: (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 1 2 3 4
				; DUMP: CalleeEdges:
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[D]] to Caller: [[F]] AllocTypes: NotCold ContextIds: 2
				; DUMP: Edge from Callee [[D]] to Caller: [[C2:0x[a-z0-9]+]] AllocTypes: Cold ContextIds: 3
				; DUMP: Edge from Callee [[D]] to Caller: [[B:0x[a-z0-9]+]] AllocTypes: Cold ContextIds: 4
				; DUMP: Edge from Callee [[D]] to Caller: [[E:0x[a-z0-9]+]] AllocTypes: Cold ContextIds: 1

				; DUMP: Node [[F]]
				; DUMP: Callee: 4881081444663423788 (_Z1Dv) Clones: 0 StackIds: 1 (clone 0)
				; DUMP: AllocTypes: NotCold
				; DUMP: ContextIds: 2
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[D]] to Caller: [[F]] AllocTypes: NotCold ContextIds: 2
				; DUMP: CallerEdges:

				; DUMP: Node [[C2]]
				; DUMP: Callee: 4881081444663423788 (_Z1Dv) Clones: 0 StackIds: 0 (clone 0)
				; DUMP: AllocTypes: Cold
				; DUMP: ContextIds: 3
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[D]] to Caller: [[C2]] AllocTypes: Cold ContextIds: 3
				; DUMP: CallerEdges:

				; DUMP: Node [[B]]
				; DUMP: Callee: 4881081444663423788 (_Z1Dv) Clones: 0 StackIds: 0, 2 (clone 0)
				; DUMP: AllocTypes: Cold
				; DUMP: ContextIds: 4
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[D]] to Caller: [[B]] AllocTypes: Cold ContextIds: 4
				; DUMP: CallerEdges:

				; DUMP: Node [[E]]
				; DUMP: Callee: 4881081444663423788 (_Z1Dv) Clones: 0 StackIds: 0, 3 (clone 0)
				; DUMP: AllocTypes: Cold
				; DUMP: ContextIds: 1
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[D]] to Caller: [[E]] AllocTypes: Cold ContextIds: 1
				; DUMP: CallerEdges:


				; DOTPRE: digraph CallsiteContextGraph {
				; DOTPRE: N[[D:0x[a-z0-9]+]] [shape="record",label="OrigId: Alloc0\n_Z1Dv -\> alloc",tooltip="N[[D]] ContextIds: 1 2",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOTPRE: N[[F:0x[a-z0-9]+]] [shape="record",label="OrigId: 13543580133643026784\nnull call (external)",tooltip="N[[F]] ContextIds: 2",fillcolor="brown1",style="filled",style="filled"]; // callsite, default
				; DOTPRE: N[[C:0x[a-z0-9]+]] [shape="record",label="OrigId: 12176601099670543485\nnull call (external)",tooltip="N[[C]] ContextIds: 1",fillcolor="cyan",style="filled",style="filled"]; // callsite, cold
				; DOTPRE: // Edges:
				; DOTPRE: N[[C]] -> N[[D]][tooltip=" ContextIds: 1",fillcolor="cyan"]; // cold
				; DOTPRE: N[[F]] -> N[[D]][tooltip=" ContextIds: 2",fillcolor="brown1"]; // default
				; DOTPRE: }


				; DOTPOST: digraph CallsiteContextGraph {
				; DOTPOST: N[[D:0x[a-z0-9]+]] [shape="record",label="OrigId: Alloc0\n_Z1Dv -\> alloc",tooltip="N[[D]] ContextIds: 1 2 3 4",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOTPOST: N[[E:0x[a-z0-9]+]] [shape="record",label="OrigId: 0\n_Z1Ev -\> _Z1Dv",tooltip="N[[E]] ContextIds: 1",fillcolor="cyan",style="filled",style="filled"]; // callsite, cold
				; DOTPOST: N[[B:0x[a-z0-9]+]] [shape="record",label="OrigId: 0\n_Z1Bv -\> _Z1Dv",tooltip="N[[B]] ContextIds: 4",fillcolor="cyan",style="filled",style="filled"]; // callsite, cold
				; DOTPOST: N[[C:0x[a-z0-9]+]] [shape="record",label="OrigId: 0\n_Z1Cv -\> _Z1Dv",tooltip="N[[C]] ContextIds: 3",fillcolor="cyan",style="filled",style="filled"]; // callsite, cold
				; DOTPOST: N[[F:0x[a-z0-9]+]] [shape="record",label="OrigId: 13543580133643026784\n_Z1Fv -\> _Z1Dv",tooltip="N[[F]] ContextIds: 2",fillcolor="brown1",style="filled",style="filled"]; // callsite, default
				; DOTPOST: // Edges:
				; DOTPOST: N[[F]] -> N[[D]][tooltip=" ContextIds: 2",fillcolor="brown1"]; // default
				; DOTPOST: N[[C]] -> N[[D]][tooltip=" ContextIds: 3",fillcolor="cyan"]; // cold
				; DOTPOST: N[[B]] -> N[[D]][tooltip=" ContextIds: 4",fillcolor="cyan"]; // cold
				; DOTPOST: N[[E]] -> N[[D]][tooltip=" ContextIds: 1",fillcolor="cyan"]; // cold
				; DOTPOST: }

llvm/test/ThinLTO/X86/pgho-indirectcall.ll

This file was added.

				;; Tests callsite context graph generation for call graph containing indirect
				;; calls. Currently this should result in conservative behavior, such that the
				;; indirect call receives a null call in its graph node, to prevent subsequent
				;; cloning.
				;;
				;; Original code looks like:
				;;
				;; char *foo() {
				;; return new char[10];
				;; }
				;; class A {
				;; public:
				;; virtual char *x() { return foo(); }
				;; };
				;; class B : public A {
				;; public:
				;; char *x() final { return foo(); }
				;; };
				;; char bar(A a) {
				;; return a->x();
				;; }
				;; int main(int argc, char **argv) {
				;; char *x = foo();
				;; char *y = foo();
				;; B b;
				;; char *z = bar(&b);
				;; char *w = bar(&b);
				;; A a;
				;; char *r = bar(&a);
				;; char *s = bar(&a);
				;; memset(x, 0, 10);
				;; memset(y, 0, 10);
				;; memset(z, 0, 10);
				;; memset(w, 0, 10);
				;; memset(r, 0, 10);
				;; memset(s, 0, 10);
				;; delete[] x;
				;; delete[] w;
				;; delete[] r;
				;; sleep(10);
				;; delete[] y;
				;; delete[] z;
				;; delete[] s;
				;; return 0;
				;; }
				;;
				;; Code compiled with -mllvm -memprof-min-lifetime-cold-threshold=5 so that the
				;; memory freed after sleep(10) results in cold lifetimes.
				;;
				;; Compiled without optimization to prevent inlining and devirtualization.

				; RUN: opt -thinlto-bc %s >%t.o
				; RUN: llvm-lto2 run %t.o -enable-pgho-context-disambiguation \
				; RUN: -r=%t.o,main,plx \
				; RUN: -r=%t.o,sleep, \
				; RUN: -r=%t.o,_Znam, \
				; RUN: -r=%t.o,_ZdaPv, \
				; RUN: -r=%t.o,_ZTVN10__cxxabiv120__si_class_type_infoE, \
				; RUN: -r=%t.o,_ZTVN10__cxxabiv117__class_type_infoE, \
				; RUN: -pgho-verify-ccg -pgho-verify-nodes -pgho-dump-ccg \
				; RUN: -pgho-export-to-dot -pgho-dot-file-path-prefix=%t. \
				; RUN: -o %t.out 2>&1 \| FileCheck %s --check-prefix=DUMP

				; RUN: cat %t.ccg.postbuild.dot \| FileCheck %s --check-prefix=DOT

				; ModuleID = 'indirectcall.ll'
				source_filename = "indirectcall.ll"
				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				%class.B = type { %class.A }
				%class.A = type { ptr }

				$_ZN1BC2Ev = comdat any

				$_ZN1AC2Ev = comdat any

				$_ZN1A1xEv = comdat any

				$_ZN1B1xEv = comdat any

				$_ZTV1B = comdat any

				$_ZTS1B = comdat any

				$_ZTS1A = comdat any

				$_ZTI1A = comdat any

				$_ZTI1B = comdat any

				$_ZTV1A = comdat any

				@_ZTV1B = internal unnamed_addr constant { [3 x ptr] } { [3 x ptr] [ptr null, ptr @_ZTI1B, ptr @_ZN1B1xEv] }, comdat, align 8, !type !0, !type !1, !type !2, !type !3
				@_ZTVN10__cxxabiv120__si_class_type_infoE = external global ptr
				@_ZTS1B = internal constant [3 x i8] c"1B\00", comdat, align 1
				@_ZTVN10__cxxabiv117__class_type_infoE = external global ptr
				@_ZTS1A = internal constant [3 x i8] c"1A\00", comdat, align 1
				@_ZTI1A = internal constant { ptr, ptr } { ptr getelementptr inbounds (ptr, ptr @_ZTVN10__cxxabiv117__class_type_infoE, i64 2), ptr @_ZTS1A }, comdat, align 8
				@_ZTI1B = internal constant { ptr, ptr, ptr } { ptr getelementptr inbounds (ptr, ptr @_ZTVN10__cxxabiv120__si_class_type_infoE, i64 2), ptr @_ZTS1B, ptr @_ZTI1A }, comdat, align 8
				@_ZTV1A = internal unnamed_addr constant { [3 x ptr] } { [3 x ptr] [ptr null, ptr @_ZTI1A, ptr @_ZN1A1xEv] }, comdat, align 8, !type !0, !type !1

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z3barP1A(ptr noundef %a) #0 {
				entry:
				%a.addr = alloca ptr, align 8
				store ptr %a, ptr %a.addr, align 8
				%0 = load ptr, ptr %a.addr, align 8
				%vtable = load ptr, ptr %0, align 8
				%vfn = getelementptr inbounds ptr, ptr %vtable, i64 0
				%1 = load ptr, ptr %vfn, align 8
				%call = call noundef ptr %1(ptr noundef nonnull align 8 dereferenceable(8) %0), !callsite !11
				ret ptr %call
				}

				; Function Attrs: mustprogress noinline norecurse optnone uwtable
				define dso_local noundef i32 @main(i32 noundef %argc, ptr noundef %argv) #1 {
				entry:
				%retval = alloca i32, align 4
				%argc.addr = alloca i32, align 4
				%argv.addr = alloca ptr, align 8
				%x = alloca ptr, align 8
				%y = alloca ptr, align 8
				%b = alloca %class.B, align 8
				%z = alloca ptr, align 8
				%w = alloca ptr, align 8
				%a = alloca %class.A, align 8
				%r = alloca ptr, align 8
				%s = alloca ptr, align 8
				store i32 0, ptr %retval, align 4
				store i32 %argc, ptr %argc.addr, align 4
				store ptr %argv, ptr %argv.addr, align 8
				%call = call noundef ptr @_Z3foov(), !callsite !12
				store ptr %call, ptr %x, align 8
				%call1 = call noundef ptr @_Z3foov(), !callsite !13
				store ptr %call1, ptr %y, align 8
				call void @_ZN1BC2Ev(ptr noundef nonnull align 8 dereferenceable(8) %b) #7
				%call2 = call noundef ptr @_Z3barP1A(ptr noundef %b), !callsite !14
				store ptr %call2, ptr %z, align 8
				%call3 = call noundef ptr @_Z3barP1A(ptr noundef %b), !callsite !15
				store ptr %call3, ptr %w, align 8
				call void @_ZN1AC2Ev(ptr noundef nonnull align 8 dereferenceable(8) %a) #7
				%call4 = call noundef ptr @_Z3barP1A(ptr noundef %a), !callsite !16
				store ptr %call4, ptr %r, align 8
				%call5 = call noundef ptr @_Z3barP1A(ptr noundef %a), !callsite !17
				store ptr %call5, ptr %s, align 8
				%0 = load ptr, ptr %x, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %0, i8 0, i64 10, i1 false)
				%1 = load ptr, ptr %y, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %1, i8 0, i64 10, i1 false)
				%2 = load ptr, ptr %z, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %2, i8 0, i64 10, i1 false)
				%3 = load ptr, ptr %w, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %3, i8 0, i64 10, i1 false)
				%4 = load ptr, ptr %r, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %4, i8 0, i64 10, i1 false)
				%5 = load ptr, ptr %s, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %5, i8 0, i64 10, i1 false)
				%6 = load ptr, ptr %x, align 8
				%isnull = icmp eq ptr %6, null
				br i1 %isnull, label %delete.end, label %delete.notnull

				delete.notnull: ; preds = %entry
				call void @_ZdaPv(ptr noundef %6) #8
				br label %delete.end

				delete.end: ; preds = %delete.notnull, %entry
				%7 = load ptr, ptr %w, align 8
				%isnull6 = icmp eq ptr %7, null
				br i1 %isnull6, label %delete.end8, label %delete.notnull7

				delete.notnull7: ; preds = %delete.end
				call void @_ZdaPv(ptr noundef %7) #8
				br label %delete.end8

				delete.end8: ; preds = %delete.notnull7, %delete.end
				%8 = load ptr, ptr %r, align 8
				%isnull9 = icmp eq ptr %8, null
				br i1 %isnull9, label %delete.end11, label %delete.notnull10

				delete.notnull10: ; preds = %delete.end8
				call void @_ZdaPv(ptr noundef %8) #8
				br label %delete.end11

				delete.end11: ; preds = %delete.notnull10, %delete.end8
				%call12 = call i32 @sleep(i32 noundef 10)
				%9 = load ptr, ptr %y, align 8
				%isnull13 = icmp eq ptr %9, null
				br i1 %isnull13, label %delete.end15, label %delete.notnull14

				delete.notnull14: ; preds = %delete.end11
				call void @_ZdaPv(ptr noundef %9) #8
				br label %delete.end15

				delete.end15: ; preds = %delete.notnull14, %delete.end11
				%10 = load ptr, ptr %z, align 8
				%isnull16 = icmp eq ptr %10, null
				br i1 %isnull16, label %delete.end18, label %delete.notnull17

				delete.notnull17: ; preds = %delete.end15
				call void @_ZdaPv(ptr noundef %10) #8
				br label %delete.end18

				delete.end18: ; preds = %delete.notnull17, %delete.end15
				%11 = load ptr, ptr %s, align 8
				%isnull19 = icmp eq ptr %11, null
				br i1 %isnull19, label %delete.end21, label %delete.notnull20

				delete.notnull20: ; preds = %delete.end18
				call void @_ZdaPv(ptr noundef %11) #8
				br label %delete.end21

				delete.end21: ; preds = %delete.notnull20, %delete.end18
				ret i32 0
				}

				; Function Attrs: noinline nounwind optnone uwtable
				define internal void @_ZN1BC2Ev(ptr noundef nonnull align 8 dereferenceable(8) %this) unnamed_addr #2 comdat align 2 {
				entry:
				%this.addr = alloca ptr, align 8
				store ptr %this, ptr %this.addr, align 8
				%this1 = load ptr, ptr %this.addr, align 8
				call void @_ZN1AC2Ev(ptr noundef nonnull align 8 dereferenceable(8) %this1) #7
				store ptr getelementptr inbounds ({ [3 x ptr] }, ptr @_ZTV1B, i32 0, inrange i32 0, i32 2), ptr %this1, align 8
				ret void
				}

				; Function Attrs: noinline nounwind optnone uwtable
				define internal void @_ZN1AC2Ev(ptr noundef nonnull align 8 dereferenceable(8) %this) unnamed_addr #2 comdat align 2 {
				entry:
				%this.addr = alloca ptr, align 8
				store ptr %this, ptr %this.addr, align 8
				%this1 = load ptr, ptr %this.addr, align 8
				store ptr getelementptr inbounds ({ [3 x ptr] }, ptr @_ZTV1A, i32 0, inrange i32 0, i32 2), ptr %this1, align 8
				ret void
				}

				; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
				declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #3

				; Function Attrs: nobuiltin nounwind
				declare void @_ZdaPv(ptr noundef) #4

				declare i32 @sleep(i32 noundef) #5

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_ZN1A1xEv(ptr noundef nonnull align 8 dereferenceable(8) %this) unnamed_addr #0 comdat align 2 {
				entry:
				%this.addr = alloca ptr, align 8
				store ptr %this, ptr %this.addr, align 8
				%this1 = load ptr, ptr %this.addr, align 8
				%call = call noundef ptr @_Z3foov(), !callsite !18
				ret ptr %call
				}

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_ZN1B1xEv(ptr noundef nonnull align 8 dereferenceable(8) %this) unnamed_addr #0 comdat align 2 {
				entry:
				%this.addr = alloca ptr, align 8
				store ptr %this, ptr %this.addr, align 8
				%this1 = load ptr, ptr %this.addr, align 8
				%call = call noundef ptr @_Z3foov(), !callsite !19
				ret ptr %call
				}

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z3foov() #0 {
				entry:
				%call = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #9, !memprof !20, !callsite !33
				ret ptr %call
				}

				; Function Attrs: nobuiltin allocsize(0)
				declare noundef nonnull ptr @_Znam(i64 noundef) #6

				attributes #0 = { mustprogress noinline optnone uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #1 = { mustprogress noinline norecurse optnone uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #2 = { noinline nounwind optnone uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #3 = { nocallback nofree nounwind willreturn memory(argmem: write) }
				attributes #4 = { nobuiltin nounwind "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #5 = { "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #6 = { nobuiltin allocsize(0) "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #7 = { nounwind }
				attributes #8 = { builtin nounwind }
				attributes #9 = { builtin allocsize(0) }

				!llvm.module.flags = !{!4, !5, !6, !7, !8, !9, !10}

				!0 = !{i64 16, !"_ZTS1A"}
				!1 = !{i64 16, !"_ZTSM1AFPcvE.virtual"}
				!2 = !{i64 16, !"_ZTS1B"}
				!3 = !{i64 16, !"_ZTSM1BFPcvE.virtual"}
				!4 = !{i32 7, !"Dwarf Version", i32 5}
				!5 = !{i32 2, !"Debug Info Version", i32 3}
				!6 = !{i32 1, !"wchar_size", i32 4}
				!7 = !{i32 8, !"PIC Level", i32 2}
				!8 = !{i32 7, !"PIE Level", i32 2}
				!9 = !{i32 7, !"uwtable", i32 2}
				!10 = !{i32 7, !"frame-pointer", i32 2}
				!11 = !{i64 -4820244510750103755}
				!12 = !{i64 8632435727821051414}
				!13 = !{i64 -3421689549917153178}
				!14 = !{i64 6792096022461663180}
				!15 = !{i64 -2709642582978494015}
				!16 = !{i64 748269490701775343}
				!17 = !{i64 -5747251260480066785}
				!18 = !{i64 8256774051149711748}
				!19 = !{i64 -4831879094954754638}
				!20 = !{!21, !23, !25, !27, !29, !31}
				!21 = !{!22, !"notcold"}
				!22 = !{i64 2732490490862098848, i64 8256774051149711748, i64 -4820244510750103755, i64 748269490701775343}
				!23 = !{!24, !"cold"}
				!24 = !{i64 2732490490862098848, i64 8256774051149711748, i64 -4820244510750103755, i64 -5747251260480066785}
				!25 = !{!26, !"notcold"}
				!26 = !{i64 2732490490862098848, i64 8632435727821051414}
				!27 = !{!28, !"cold"}
				!28 = !{i64 2732490490862098848, i64 -4831879094954754638, i64 -4820244510750103755, i64 6792096022461663180}
				!29 = !{!30, !"notcold"}
				!30 = !{i64 2732490490862098848, i64 -4831879094954754638, i64 -4820244510750103755, i64 -2709642582978494015}
				!31 = !{!32, !"cold"}
				!32 = !{i64 2732490490862098848, i64 -3421689549917153178}
				!33 = !{i64 2732490490862098848}


				; DUMP: CCG before cloning:
				; DUMP: Callsite Context Graph:
				; DUMP: Node [[FOO:0x[a-z0-9]+]]
				; DUMP: Versions: 1 MIB:
				; DUMP: AllocType 1 StackIds: 6, 8, 4
				; DUMP: AllocType 2 StackIds: 6, 8, 5
				; DUMP: AllocType 1 StackIds: 0
				; DUMP: AllocType 2 StackIds: 7, 8, 2
				; DUMP: AllocType 1 StackIds: 7, 8, 3
				; DUMP: AllocType 2 StackIds: 1
				; DUMP: (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 1 2 3 4 5 6
				; DUMP: CalleeEdges:
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[FOO]] to Caller: [[AX:0x[a-z0-9]+]] AllocTypes: NotColdCold ContextIds: 1 2
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN1:0x[a-z0-9]+]] AllocTypes: NotCold ContextIds: 3
				; DUMP: Edge from Callee [[FOO]] to Caller: [[BX:0x[a-z0-9]+]] AllocTypes: NotColdCold ContextIds: 4 5
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN2:0x[a-z0-9]+]] AllocTypes: Cold ContextIds: 6

				; DUMP: Node [[AX]]
				; DUMP: Callee: 12914368124089294956 (_Z3foov) Clones: 0 StackIds: 6 (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 1 2
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[FOO]] to Caller: [[AX]] AllocTypes: NotColdCold ContextIds: 1 2
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[AX]] to Caller: [[BAR:0x[a-z0-9]+]] AllocTypes: NotColdCold ContextIds: 1 2

				;; Bar contains an indirect call, with multiple targets. It's call should be null.
				; DUMP: Node [[BAR]]
				; DUMP: null Call
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 1 2 4 5
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[AX]] to Caller: [[BAR]] AllocTypes: NotColdCold ContextIds: 1 2
				; DUMP: Edge from Callee [[BX]] to Caller: [[BAR]] AllocTypes: NotColdCold ContextIds: 4 5
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[BAR]] to Caller: [[MAIN3:0x[a-z0-9]+]] AllocTypes: NotCold ContextIds: 1
				; DUMP: Edge from Callee [[BAR]] to Caller: [[MAIN4:0x[a-z0-9]+]] AllocTypes: Cold ContextIds: 2
				; DUMP: Edge from Callee [[BAR]] to Caller: [[MAIN5:0x[a-z0-9]+]] AllocTypes: Cold ContextIds: 4
				; DUMP: Edge from Callee [[BAR]] to Caller: [[MAIN6:0x[a-z0-9]+]] AllocTypes: NotCold ContextIds: 5

				; DUMP: Node [[MAIN3]]
				; DUMP: Callee: 4095956691517954349 (_Z3barP1A) Clones: 0 StackIds: 4 (clone 0)
				; DUMP: AllocTypes: NotCold
				; DUMP: ContextIds: 1
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[BAR]] to Caller: [[MAIN3]] AllocTypes: NotCold ContextIds: 1
				; DUMP: CallerEdges:

				; DUMP: Node [[MAIN4]]
				; DUMP: Callee: 4095956691517954349 (_Z3barP1A) Clones: 0 StackIds: 5 (clone 0)
				; DUMP: AllocTypes: Cold
				; DUMP: ContextIds: 2
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[BAR]] to Caller: [[MAIN4]] AllocTypes: Cold ContextIds: 2
				; DUMP: CallerEdges:

				; DUMP: Node [[MAIN5]]
				; DUMP: Callee: 4095956691517954349 (_Z3barP1A) Clones: 0 StackIds: 2 (clone 0)
				; DUMP: AllocTypes: Cold
				; DUMP: ContextIds: 4
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[BAR]] to Caller: [[MAIN5]] AllocTypes: Cold ContextIds: 4
				; DUMP: CallerEdges:

				; DUMP: Node [[MAIN6]]
				; DUMP: Callee: 4095956691517954349 (_Z3barP1A) Clones: 0 StackIds: 3 (clone 0)
				; DUMP: AllocTypes: NotCold
				; DUMP: ContextIds: 5
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[BAR]] to Caller: [[MAIN6]] AllocTypes: NotCold ContextIds: 5
				; DUMP: CallerEdges:

				; DUMP: Node [[MAIN1]]
				; DUMP: Callee: 12914368124089294956 (_Z3foov) Clones: 0 StackIds: 0 (clone 0)
				; DUMP: AllocTypes: NotCold
				; DUMP: ContextIds: 3
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN1]] AllocTypes: NotCold ContextIds: 3
				; DUMP: CallerEdges:

				; DUMP: Node [[BX]]
				; DUMP: Callee: 12914368124089294956 (_Z3foov) Clones: 0 StackIds: 7 (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 4 5
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[FOO]] to Caller: [[BX]] AllocTypes: NotColdCold ContextIds: 4 5
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[BX]] to Caller: [[BAR]] AllocTypes: NotColdCold ContextIds: 4 5

				; DUMP: Node [[MAIN2]]
				; DUMP: Callee: 12914368124089294956 (_Z3foov) Clones: 0 StackIds: 1 (clone 0)
				; DUMP: AllocTypes: Cold
				; DUMP: ContextIds: 6
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN2]] AllocTypes: Cold ContextIds: 6
				; DUMP: CallerEdges:


				; DOT: digraph CallsiteContextGraph {
				; DOT: N[[FOO:0x[a-z0-9]+]] [shape="record",label="OrigId: Alloc0\n_Z3foov -\> alloc",tooltip="N[[FOO]] ContextIds: 1 2 3 4 5 6",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOT: N[[MAIN1:0x[a-z0-9]+]] [shape="record",label="OrigId: 15025054523792398438\nmain -\> _Z3foov",tooltip="N[[MAIN1]] ContextIds: 6",fillcolor="cyan",style="filled",style="filled"]; // callsite, cold
				; DOT: N[[BX:0x[a-z0-9]+]] [shape="record",label="OrigId: 13614864978754796978\n_ZN1B1xEv -\> _Z3foov",tooltip="N[[BX]] ContextIds: 4 5",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				;; Bar contains an indirect call, with multiple targets. It's call should be null.
				; DOT: N[[BAR:0x[a-z0-9]+]] [shape="record",label="OrigId: 13626499562959447861\nnull call (external)",tooltip="N[[BAR]] ContextIds: 1 2 4 5",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOT: N[[MAIN2:0x[a-z0-9]+]] [shape="record",label="OrigId: 15737101490731057601\nmain -\> _Z3barP1A",tooltip="N[[MAIN2]] ContextIds: 5",fillcolor="brown1",style="filled",style="filled"]; // callsite, default
				; DOT: N[[MAIN3:0x[a-z0-9]+]] [shape="record",label="OrigId: 6792096022461663180\nmain -\> _Z3barP1A",tooltip="N[[MAIN3]] ContextIds: 4",fillcolor="cyan",style="filled",style="filled"]; // callsite, cold
				; DOT: N[[MAIN4:0x[a-z0-9]+]] [shape="record",label="OrigId: 12699492813229484831\nmain -\> _Z3barP1A",tooltip="N[[MAIN4]] ContextIds: 2",fillcolor="cyan",style="filled",style="filled"]; // callsite, cold
				; DOT: N[[MAIN5:0x[a-z0-9]+]] [shape="record",label="OrigId: 748269490701775343\nmain -\> _Z3barP1A",tooltip="N[[MAIN5]] ContextIds: 1",fillcolor="brown1",style="filled",style="filled"]; // callsite, default
				; DOT: N[[MAIN6:0x[a-z0-9]+]] [shape="record",label="OrigId: 8632435727821051414\nmain -\> _Z3foov",tooltip="N[[MAIN6]] ContextIds: 3",fillcolor="brown1",style="filled",style="filled"]; // callsite, default
				; DOT: N[[AX:0x[a-z0-9]+]] [shape="record",label="OrigId: 8256774051149711748\n_ZN1A1xEv -\> _Z3foov",tooltip="N[[AX]] ContextIds: 1 2",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOT: // Edges:
				; DOT: N[[AX]] -> N[[FOO]][tooltip=" ContextIds: 1 2",fillcolor="mediumorchid1"]; // default\|cold
				; DOT: N[[MAIN6]] -> N[[FOO]][tooltip=" ContextIds: 3",fillcolor="brown1"]; // default
				; DOT: N[[BX]] -> N[[FOO]][tooltip=" ContextIds: 4 5",fillcolor="mediumorchid1"]; // default\|cold
				; DOT: N[[MAIN1]] -> N[[FOO]][tooltip=" ContextIds: 6",fillcolor="cyan"]; // cold
				; DOT: N[[BAR]] -> N[[BX]][tooltip=" ContextIds: 4 5",fillcolor="mediumorchid1"]; // default\|cold
				; DOT: N[[MAIN5]] -> N[[BAR]][tooltip=" ContextIds: 1",fillcolor="brown1"]; // default
				; DOT: N[[MAIN4]] -> N[[BAR]][tooltip=" ContextIds: 2",fillcolor="cyan"]; // cold
				; DOT: N[[MAIN3]] -> N[[BAR]][tooltip=" ContextIds: 4",fillcolor="cyan"]; // cold
				; DOT: N[[MAIN2]] -> N[[BAR]][tooltip=" ContextIds: 5",fillcolor="brown1"]; // default
				; DOT: N[[BAR]] -> N[[AX]][tooltip=" ContextIds: 1 2",fillcolor="mediumorchid1"]; // default\|cold
				; DOT: }

llvm/test/ThinLTO/X86/pgho-inlined.ll

This file was added.

				;; Test callsite context graph generation for call graph with two memprof
				;; contexts and partial inlining, requiring generation of a new fused node to
				;; represent the inlined sequence while matching callsite nodes onto the graph.
				;;
				;; Original code looks like:
				;;
				;; char *bar() {
				;; return new char[10];
				;; }
				;;
				;; char *baz() {
				;; return bar();
				;; }
				;;
				;; char *foo() {
				;; return baz();
				;; }
				;;
				;; int main(int argc, char **argv) {
				;; char *x = foo();
				;; char *y = foo();
				;; memset(x, 0, 10);
				;; memset(y, 0, 10);
				;; delete[] x;
				;; sleep(10);
				;; delete[] y;
				;; return 0;
				;; }
				;;
				;; Code compiled with -mllvm -memprof-min-lifetime-cold-threshold=5 so that the
				;; memory freed after sleep(10) results in cold lifetimes.
				;;
				;; The code below was created by forcing inlining of baz into foo, and
				;; bar into baz. Due to the inlining of bar we will initially have two
				;; allocation nodes in the graph. This tests that we correctly match
				;; foo (with baz inlined) onto the graph nodes first, and generate a new
				;; fused node for it. We should then not match baz (with bar inlined) as that
				;; is not reached by the MIB contexts (since all calls from main will look
				;; like main -> foo(+baz) -> bar after the inlining reflected in this IR).

				; RUN: opt -thinlto-bc %s >%t.o
				; RUN: llvm-lto2 run %t.o -enable-pgho-context-disambiguation \
				; RUN: -r=%t.o,main,plx \
				; RUN: -r=%t.o,_ZdaPv, \
				; RUN: -r=%t.o,sleep, \
				; RUN: -r=%t.o,_Znam, \
				; RUN: -pgho-verify-ccg -pgho-verify-nodes -pgho-dump-ccg \
				; RUN: -pgho-export-to-dot -pgho-dot-file-path-prefix=%t. \
				; RUN: -o %t.out 2>&1 \| FileCheck %s --check-prefix=DUMP

				; RUN: cat %t.ccg.postbuild.dot \| FileCheck %s --check-prefix=DOT

				; ModuleID = 'inlined.ll'
				source_filename = "inlined.ll"
				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: mustprogress uwtable
				define internal noundef ptr @_Z3barv() #0 {
				entry:
				%call = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #7, !memprof !7, !callsite !12
				ret ptr %call
				}

				; Function Attrs: nobuiltin allocsize(0)
				declare noundef nonnull ptr @_Znam(i64 noundef) #1

				; Function Attrs: mustprogress uwtable
				define internal noundef ptr @_Z3bazv() #0 {
				entry:
				%call.i = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #7, !memprof !7, !callsite !13
				ret ptr %call.i
				}

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z3foov() #2 {
				entry:
				%call.i = call noundef ptr @_Z3barv(), !callsite !14
				ret ptr %call.i
				}

				; Function Attrs: mustprogress noinline norecurse optnone uwtable
				define dso_local noundef i32 @main(i32 noundef %argc, ptr noundef %argv) #3 {
				entry:
				%retval = alloca i32, align 4
				%argc.addr = alloca i32, align 4
				%argv.addr = alloca ptr, align 8
				%x = alloca ptr, align 8
				%y = alloca ptr, align 8
				store i32 0, ptr %retval, align 4
				store i32 %argc, ptr %argc.addr, align 4
				store ptr %argv, ptr %argv.addr, align 8
				%call = call noundef ptr @_Z3foov(), !callsite !15
				store ptr %call, ptr %x, align 8
				%call1 = call noundef ptr @_Z3foov(), !callsite !16
				store ptr %call1, ptr %y, align 8
				%0 = load ptr, ptr %x, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %0, i8 0, i64 10, i1 false)
				%1 = load ptr, ptr %y, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %1, i8 0, i64 10, i1 false)
				%2 = load ptr, ptr %x, align 8
				%isnull = icmp eq ptr %2, null
				br i1 %isnull, label %delete.end, label %delete.notnull

				delete.notnull: ; preds = %entry
				call void @_ZdaPv(ptr noundef %2) #8
				br label %delete.end

				delete.end: ; preds = %delete.notnull, %entry
				%call2 = call i32 @sleep(i32 noundef 10)
				%3 = load ptr, ptr %y, align 8
				%isnull3 = icmp eq ptr %3, null
				br i1 %isnull3, label %delete.end5, label %delete.notnull4

				delete.notnull4: ; preds = %delete.end
				call void @_ZdaPv(ptr noundef %3) #8
				br label %delete.end5

				delete.end5: ; preds = %delete.notnull4, %delete.end
				ret i32 0
				}

				; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
				declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #4

				; Function Attrs: nobuiltin nounwind
				declare void @_ZdaPv(ptr noundef) #5

				declare i32 @sleep(i32 noundef) #6

				attributes #0 = { mustprogress uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #1 = { nobuiltin allocsize(0) "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #2 = { mustprogress noinline optnone uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #3 = { mustprogress noinline norecurse optnone uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #4 = { nocallback nofree nounwind willreturn memory(argmem: write) }
				attributes #5 = { nobuiltin nounwind "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #6 = { "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #7 = { builtin allocsize(0) }
				attributes #8 = { builtin nounwind }

				!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6}

				!0 = !{i32 7, !"Dwarf Version", i32 5}
				!1 = !{i32 2, !"Debug Info Version", i32 3}
				!2 = !{i32 1, !"wchar_size", i32 4}
				!3 = !{i32 8, !"PIC Level", i32 2}
				!4 = !{i32 7, !"PIE Level", i32 2}
				!5 = !{i32 7, !"uwtable", i32 2}
				!6 = !{i32 7, !"frame-pointer", i32 2}
				!7 = !{!8, !10}
				!8 = !{!9, !"notcold"}
				!9 = !{i64 9086428284934609951, i64 -5964873800580613432, i64 2732490490862098848, i64 8632435727821051414}
				!10 = !{!11, !"cold"}
				!11 = !{i64 9086428284934609951, i64 -5964873800580613432, i64 2732490490862098848, i64 -3421689549917153178}
				!12 = !{i64 9086428284934609951}
				!13 = !{i64 9086428284934609951, i64 -5964873800580613432}
				!14 = !{i64 -5964873800580613432, i64 2732490490862098848}
				!15 = !{i64 8632435727821051414}
				!16 = !{i64 -3421689549917153178}


				; DUMP: CCG before cloning:
				; DUMP: Callsite Context Graph:
				; DUMP: Node [[BAR:0x[a-z0-9]+]]
				; DUMP: Versions: 1 MIB:
				; DUMP: AllocType 1 StackIds: 0, 1, 2
				; DUMP: AllocType 2 StackIds: 0, 1, 3
				; DUMP: (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 3 4
				; DUMP: CalleeEdges:
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[BAR]] to Caller: [[FOO:0x[a-z0-9]+]] AllocTypes: NotColdCold ContextIds: 3 4

				;; This is the node synthesized for the call to bar in foo that was created
				;; by inlining baz into foo.
				; DUMP: Node [[FOO]]
				; DUMP: Callee: 16064618363798697104 (_Z3barv) Clones: 0 StackIds: 0, 1 (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 3 4
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[BAR]] to Caller: [[FOO]] AllocTypes: NotColdCold ContextIds: 3 4
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN1:0x[a-z0-9]+]] AllocTypes: NotCold ContextIds: 3
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN2:0x[a-z0-9]+]] AllocTypes: Cold ContextIds: 4

				; DUMP: Node [[MAIN1]]
				; DUMP: Callee: 2229562716906371625 (_Z3foov) Clones: 0 StackIds: 2 (clone 0)
				; DUMP: AllocTypes: NotCold
				; DUMP: ContextIds: 1 3
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[FOO2:0x[a-z0-9]+]] to Caller: [[MAIN1]] AllocTypes: NotCold ContextIds: 1
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN1]] AllocTypes: NotCold ContextIds: 3
				; DUMP: CallerEdges:

				; DUMP: Node [[MAIN2]]
				; DUMP: Callee: 2229562716906371625 (_Z3foov) Clones: 0 StackIds: 3 (clone 0)
				; DUMP: AllocTypes: Cold
				; DUMP: ContextIds: 2 4
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[FOO2]] to Caller: [[MAIN2]] AllocTypes: Cold ContextIds: 2
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN2]] AllocTypes: Cold ContextIds: 4
				; DUMP: CallerEdges:

				; DUMP: Node [[BAZ:0x[a-z0-9]+]]
				; DUMP: Versions: 1 MIB:
				; DUMP: AllocType 1 StackIds: 1, 2
				; DUMP: AllocType 2 StackIds: 1, 3
				; DUMP: (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 1 2
				; DUMP: CalleeEdges:
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[BAZ]] to Caller: [[FOO2]] AllocTypes: NotColdCold ContextIds: 1 2

				;; This is leftover from the MIB on the alloc inlined into baz. It is not
				;; matched with any call, since there is no such node in the IR. Due to the
				;; null call it will not participate in any context transformations.
				; DUMP: Node [[FOO2]]
				; DUMP: null Call
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 1 2
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[BAZ]] to Caller: [[FOO2]] AllocTypes: NotColdCold ContextIds: 1 2
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[FOO2]] to Caller: [[MAIN1]] AllocTypes: NotCold ContextIds: 1
				; DUMP: Edge from Callee [[FOO2]] to Caller: [[MAIN2]] AllocTypes: Cold ContextIds: 2


				; DOT: digraph CallsiteContextGraph {
				; DOT: N[[BAR:0x[a-z0-9]+]] [shape="record",label="OrigId: Alloc0\n_Z3bazv -\> alloc",tooltip="N[[BAR]] ContextIds: 1 2",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOT: N[[FOO:0x[a-z0-9]+]] [shape="record",label="OrigId: 2732490490862098848\nnull call (external)",tooltip="N[[FOO]] ContextIds: 1 2",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOT: N[[MAIN2:0x[a-z0-9]+]] [shape="record",label="OrigId: 15025054523792398438\nmain -\> _Z3foov",tooltip="N[[MAIN2]] ContextIds: 2 4",fillcolor="cyan",style="filled",style="filled"]; // callsite, cold
				; DOT: N[[MAIN1:0x[a-z0-9]+]] [shape="record",label="OrigId: 8632435727821051414\nmain -\> _Z3foov",tooltip="N[[MAIN1]] ContextIds: 1 3",fillcolor="brown1",style="filled",style="filled"]; // callsite, default
				; DOT: N[[BAZ:0x[a-z0-9]+]] [shape="record",label="OrigId: Alloc2\n_Z3barv -\> alloc",tooltip="N[[BAZ]] ContextIds: 3 4",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOT: N[[FOO2:0x[a-z0-9]+]] [shape="record",label="OrigId: 0\n_Z3foov -\> _Z3barv",tooltip="N[[FOO2]] ContextIds: 3 4",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOT: // Edges:
				; DOT: N[[FOO]] -> N[[BAR]][tooltip=" ContextIds: 1 2",fillcolor="mediumorchid1"]; // default\|cold
				; DOT: N[[MAIN1]] -> N[[FOO]][tooltip=" ContextIds: 1",fillcolor="brown1"]; // default
				; DOT: N[[MAIN2]] -> N[[FOO]][tooltip=" ContextIds: 2",fillcolor="cyan"]; // cold
				; DOT: N[[FOO2]] -> N[[BAZ]][tooltip=" ContextIds: 3 4",fillcolor="mediumorchid1"]; // default\|cold
				; DOT: N[[MAIN1]] -> N[[FOO2]][tooltip=" ContextIds: 3",fillcolor="brown1"]; // default
				; DOT: N[[MAIN2]] -> N[[FOO2]][tooltip=" ContextIds: 4",fillcolor="cyan"]; // cold
				; DOT: }

llvm/test/Transforms/PGHOContextDisambiguation/basic.ll

This file was added.

				;; Test callsite context graph generation for simple call graph with
				;; two memprof contexts and no inlining.
				;;
				;; Original code looks like:
				;;
				;; char *bar() {
				;; return new char[10];
				;; }
				;;
				;; char *baz() {
				;; return bar();
				;; }
				;;
				;; char *foo() {
				;; return baz();
				;; }
				;;
				;; int main(int argc, char **argv) {
				;; char *x = foo();
				;; char *y = foo();
				;; memset(x, 0, 10);
				;; memset(y, 0, 10);
				;; delete[] x;
				;; sleep(10);
				;; delete[] y;
				;; return 0;
				;; }
				;;
				;; Code compiled with -mllvm -memprof-min-lifetime-cold-threshold=5 so that the
				;; memory freed after sleep(10) results in cold lifetimes.
				davidxlUnsubmitted Not Done Reply Inline Actions why is the option called '-lifetime-cold-'? should it be '-lifetime-short-'? davidxl: why is the option called '-lifetime-cold-'? should it be '-lifetime-short-'?
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions This is the minimum lifetime required in order to mark the context at cold when we do profile matching. tejohnson: This is the minimum lifetime required in order to mark the context at cold when we do profile…

				; RUN: opt -passes=pgho-context-disambiguation \
				; RUN: -pgho-verify-ccg -pgho-verify-nodes -pgho-dump-ccg \
				; RUN: -pgho-export-to-dot -pgho-dot-file-path-prefix=%t. \
				; RUN: %s -S 2>&1 \| FileCheck %s --check-prefix=DUMP

				; RUN: cat %t.ccg.postbuild.dot \| FileCheck %s --check-prefix=DOT

				; ModuleID = 'basic.ll'
				source_filename = "basic.ll"
				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: mustprogress noinline norecurse optnone uwtable
				define dso_local noundef i32 @main(i32 noundef %argc, ptr noundef %argv) #0 {
				entry:
				%retval = alloca i32, align 4
				%argc.addr = alloca i32, align 4
				%argv.addr = alloca ptr, align 8
				%x = alloca ptr, align 8
				%y = alloca ptr, align 8
				store i32 0, ptr %retval, align 4
				store i32 %argc, ptr %argc.addr, align 4
				store ptr %argv, ptr %argv.addr, align 8
				%call = call noundef ptr @_Z3foov(), !callsite !7
				store ptr %call, ptr %x, align 8
				%call1 = call noundef ptr @_Z3foov(), !callsite !8
				store ptr %call1, ptr %y, align 8
				%0 = load ptr, ptr %x, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %0, i8 0, i64 10, i1 false)
				%1 = load ptr, ptr %y, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %1, i8 0, i64 10, i1 false)
				%2 = load ptr, ptr %x, align 8
				%isnull = icmp eq ptr %2, null
				br i1 %isnull, label %delete.end, label %delete.notnull

				delete.notnull: ; preds = %entry
				call void @_ZdaPv(ptr noundef %2) #6
				br label %delete.end

				delete.end: ; preds = %delete.notnull, %entry
				%call2 = call i32 @sleep(i32 noundef 10)
				%3 = load ptr, ptr %y, align 8
				%isnull3 = icmp eq ptr %3, null
				br i1 %isnull3, label %delete.end5, label %delete.notnull4

				delete.notnull4: ; preds = %delete.end
				call void @_ZdaPv(ptr noundef %3) #6
				br label %delete.end5

				delete.end5: ; preds = %delete.notnull4, %delete.end
				ret i32 0
				}

				; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
				declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #1

				; Function Attrs: nobuiltin nounwind
				declare void @_ZdaPv(ptr noundef) #2

				declare i32 @sleep(i32 noundef) #3

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z3barv() #4 {
				entry:
				%call = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #7, !memprof !9, !callsite !14
				ret ptr %call
				}

				; Function Attrs: nobuiltin allocsize(0)
				declare noundef nonnull ptr @_Znam(i64 noundef) #5

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z3bazv() #4 {
				entry:
				%call = call noundef ptr @_Z3barv(), !callsite !15
				ret ptr %call
				}

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z3foov() #4 {
				entry:
				%call = call noundef ptr @_Z3bazv(), !callsite !16
				ret ptr %call
				}

				attributes #0 = { mustprogress noinline norecurse optnone uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #1 = { nocallback nofree nounwind willreturn memory(argmem: write) }
				attributes #2 = { nobuiltin nounwind "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #3 = { "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #4 = { mustprogress noinline optnone uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #5 = { nobuiltin allocsize(0) "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #6 = { builtin nounwind }
				attributes #7 = { builtin allocsize(0) }

				!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6}

				!0 = !{i32 7, !"Dwarf Version", i32 5}
				!1 = !{i32 2, !"Debug Info Version", i32 3}
				!2 = !{i32 1, !"wchar_size", i32 4}
				!3 = !{i32 8, !"PIC Level", i32 2}
				!4 = !{i32 7, !"PIE Level", i32 2}
				!5 = !{i32 7, !"uwtable", i32 2}
				!6 = !{i32 7, !"frame-pointer", i32 2}
				!7 = !{i64 8632435727821051414}
				!8 = !{i64 -3421689549917153178}
				!9 = !{!10, !12}
				!10 = !{!11, !"notcold"}
				!11 = !{i64 9086428284934609951, i64 -5964873800580613432, i64 2732490490862098848, i64 8632435727821051414}
				!12 = !{!13, !"cold"}
				!13 = !{i64 9086428284934609951, i64 -5964873800580613432, i64 2732490490862098848, i64 -3421689549917153178}
				!14 = !{i64 9086428284934609951}
				!15 = !{i64 -5964873800580613432}
				!16 = !{i64 2732490490862098848}

				; DUMP: CCG before cloning:
				; DUMP: Callsite Context Graph:
				; DUMP: Node [[BAR:0x[a-z0-9]+]]
				; DUMP: %call = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #6 (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 1 2
				; DUMP: CalleeEdges:
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[BAR]] to Caller: [[BAZ:0x[a-z0-9]+]] AllocTypes: NotColdCold ContextIds: 1 2

				; DUMP: Node [[BAZ]]
				; DUMP: %call = call noundef ptr @_Z3barv() (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 1 2
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[BAR]] to Caller: [[BAZ]] AllocTypes: NotColdCold ContextIds: 1 2
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[BAZ]] to Caller: [[FOO:0x[a-z0-9]+]] AllocTypes: NotColdCold ContextIds: 1 2

				; DUMP: Node [[FOO]]
				; DUMP: %call = call noundef ptr @_Z3bazv() (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 1 2
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[BAZ]] to Caller: [[FOO]] AllocTypes: NotColdCold ContextIds: 1 2
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN1:0x[a-z0-9]+]] AllocTypes: NotCold ContextIds: 1
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN2:0x[a-z0-9]+]] AllocTypes: Cold ContextIds: 2

				; DUMP: Node [[MAIN1]]
				; DUMP: %call = call noundef ptr @_Z3foov() (clone 0)
				; DUMP: AllocTypes: NotCold
				; DUMP: ContextIds: 1
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN1]] AllocTypes: NotCold ContextIds: 1
				; DUMP: CallerEdges:

				; DUMP: Node [[MAIN2]]
				; DUMP: %call1 = call noundef ptr @_Z3foov() (clone 0)
				; DUMP: AllocTypes: Cold
				; DUMP: ContextIds: 2
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN2]] AllocTypes: Cold ContextIds: 2
				; DUMP: CallerEdges:


				; DOT: digraph CallsiteContextGraph {
				; DOT: N[[BAR:0x[a-z0-9]+]] [shape="record",label="OrigId: Alloc0\n_Z3barv -\> _Znam",tooltip="N[[BAR]] ContextIds: 1 2",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOT: N[[BAZ:0x[a-z0-9]+]] [shape="record",label="OrigId: 12481870273128938184\n_Z3bazv -\> _Z3barv",tooltip="N[[BAZ]] ContextIds: 1 2",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOT: N[[FOO:0x[a-z0-9]+]] [shape="record",label="OrigId: 2732490490862098848\n_Z3foov -\> _Z3bazv",tooltip="N[[FOO]] ContextIds: 1 2",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOT: N[[MAIN1:0x[a-z0-9]+]] [shape="record",label="OrigId: 15025054523792398438\nmain -\> _Z3foov",tooltip="N[[MAIN1]] ContextIds: 2",fillcolor="cyan",style="filled",style="filled"]; // callsite, cold
				; DOT: N[[MAIN2:0x[a-z0-9]+]] [shape="record",label="OrigId: 8632435727821051414\nmain -\> _Z3foov",tooltip="N[[MAIN2]] ContextIds: 1",fillcolor="brown1",style="filled",style="filled"]; // callsite, default
				; DOT: // Edges:
				; DOT: N[[BAZ]] -> N[[BAR]][tooltip=" ContextIds: 1 2",fillcolor="mediumorchid1"]; // default\|cold
				; DOT: N[[FOO]] -> N[[BAZ]][tooltip=" ContextIds: 1 2",fillcolor="mediumorchid1"]; // default\|cold
				; DOT: N[[MAIN2]] -> N[[FOO]][tooltip=" ContextIds: 1",fillcolor="brown1"]; // default
				; DOT: N[[MAIN1]] -> N[[FOO]][tooltip=" ContextIds: 2",fillcolor="cyan"]; // cold
				; DOT: }

llvm/test/Transforms/PGHOContextDisambiguation/duplicate-context-ids.ll

This file was added.

				;; Test callsite context graph generation for call graph with with MIBs
				;; that have pruned contexts that partially match multiple inlined
				;; callsite contexts, requiring duplication of context ids and nodes
				;; while matching callsite nodes onto the graph.
				;;
				;; Original code looks like:
				;;
				;; char *D() {
				;; return new char[10];
				;; }
				;;
				;; char *F() {
				;; return D();
				;; }
				;;
				;; char *C() {
				;; return D();
				;; }
				;;
				;; char *B() {
				;; return C();
				;; }
				;;
				;; char *E() {
				;; return C();
				;; }
				;; int main(int argc, char **argv) {
				;; char *x = B(); // cold
				;; char *y = E(); // cold
				;; char *z = F(); // default
				;; memset(x, 0, 10);
				;; memset(y, 0, 10);
				;; memset(z, 0, 10);
				;; delete[] z;
				;; sleep(10);
				;; delete[] x;
				;; delete[] y;
				;; return 0;
				;; }
				;;
				;; Code compiled with -mllvm -memprof-min-lifetime-cold-threshold=5 so that the
				;; memory freed after sleep(10) results in cold lifetimes.
				;;
				;; The code below was created by forcing inlining of C into both B and E.
				;; Since both allocation contexts via C are cold, the matched memprof
				;; metadata has the context pruned above C's callsite. This requires
				;; matching the stack node for C to callsites where it was inlined (i.e.
				;; the callsites in B and E that have callsite metadata that includes C's).
				;; It also requires duplication of that node in the graph as well as the
				;; duplication of the context ids along that path through the graph,
				;; so that we can represent the duplicated (via inlining) C callsite.

				; RUN: opt -passes=pgho-context-disambiguation \
				; RUN: -pgho-verify-ccg -pgho-verify-nodes -pgho-dump-ccg \
				; RUN: -pgho-export-to-dot -pgho-dot-file-path-prefix=%t. \
				; RUN: %s -S 2>&1 \| FileCheck %s --check-prefix=DUMP

				; RUN: cat %t.ccg.prestackupdate.dot \| FileCheck %s --check-prefix=DOTPRE
				; RUN: cat %t.ccg.postbuild.dot \| FileCheck %s --check-prefix=DOTPOST

				; ModuleID = 'duplicate-context-ids.ll'
				source_filename = "duplicate-context-ids.ll"
				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z1Dv() #0 {
				entry:
				%call = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #6, !memprof !7, !callsite !12
				ret ptr %call
				}

				; Function Attrs: nobuiltin allocsize(0)
				declare noundef nonnull ptr @_Znam(i64 noundef) #1

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z1Fv() #0 {
				entry:
				%call = call noundef ptr @_Z1Dv(), !callsite !13
				ret ptr %call
				}

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z1Cv() #0 {
				entry:
				%call = call noundef ptr @_Z1Dv(), !callsite !14
				ret ptr %call
				}

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z1Bv() #0 {
				entry:
				%call.i = call noundef ptr @_Z1Dv(), !callsite !15
				ret ptr %call.i
				}

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z1Ev() #0 {
				entry:
				%call.i = call noundef ptr @_Z1Dv(), !callsite !16
				ret ptr %call.i
				}

				; Function Attrs: mustprogress noinline norecurse optnone uwtable
				define dso_local noundef i32 @main(i32 noundef %argc, ptr noundef %argv) #2 {
				entry:
				%retval = alloca i32, align 4
				%argc.addr = alloca i32, align 4
				%argv.addr = alloca ptr, align 8
				%x = alloca ptr, align 8
				%y = alloca ptr, align 8
				%z = alloca ptr, align 8
				store i32 0, ptr %retval, align 4
				store i32 %argc, ptr %argc.addr, align 4
				store ptr %argv, ptr %argv.addr, align 8
				%call = call noundef ptr @_Z1Bv(), !callsite !17
				store ptr %call, ptr %x, align 8
				%call1 = call noundef ptr @_Z1Ev(), !callsite !18
				store ptr %call1, ptr %y, align 8
				%call2 = call noundef ptr @_Z1Fv(), !callsite !19
				store ptr %call2, ptr %z, align 8
				%0 = load ptr, ptr %x, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %0, i8 0, i64 10, i1 false)
				%1 = load ptr, ptr %y, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %1, i8 0, i64 10, i1 false)
				%2 = load ptr, ptr %z, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %2, i8 0, i64 10, i1 false)
				%3 = load ptr, ptr %z, align 8
				%isnull = icmp eq ptr %3, null
				br i1 %isnull, label %delete.end, label %delete.notnull

				delete.notnull: ; preds = %entry
				call void @_ZdaPv(ptr noundef %3) #7
				br label %delete.end

				delete.end: ; preds = %delete.notnull, %entry
				%call3 = call i32 @sleep(i32 noundef 10)
				%4 = load ptr, ptr %x, align 8
				%isnull4 = icmp eq ptr %4, null
				br i1 %isnull4, label %delete.end6, label %delete.notnull5

				delete.notnull5: ; preds = %delete.end
				call void @_ZdaPv(ptr noundef %4) #7
				br label %delete.end6

				delete.end6: ; preds = %delete.notnull5, %delete.end
				%5 = load ptr, ptr %y, align 8
				%isnull7 = icmp eq ptr %5, null
				br i1 %isnull7, label %delete.end9, label %delete.notnull8

				delete.notnull8: ; preds = %delete.end6
				call void @_ZdaPv(ptr noundef %5) #7
				br label %delete.end9

				delete.end9: ; preds = %delete.notnull8, %delete.end6
				ret i32 0
				}

				; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
				declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #3

				; Function Attrs: nobuiltin nounwind
				declare void @_ZdaPv(ptr noundef) #4

				declare i32 @sleep(i32 noundef) #5

				attributes #0 = { mustprogress noinline optnone uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #1 = { nobuiltin allocsize(0) "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #2 = { mustprogress noinline norecurse optnone uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #3 = { nocallback nofree nounwind willreturn memory(argmem: write) }
				attributes #4 = { nobuiltin nounwind "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #5 = { "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #6 = { builtin allocsize(0) }
				attributes #7 = { builtin nounwind }

				!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6}

				!0 = !{i32 7, !"Dwarf Version", i32 5}
				!1 = !{i32 2, !"Debug Info Version", i32 3}
				!2 = !{i32 1, !"wchar_size", i32 4}
				!3 = !{i32 8, !"PIC Level", i32 2}
				!4 = !{i32 7, !"PIE Level", i32 2}
				!5 = !{i32 7, !"uwtable", i32 2}
				!6 = !{i32 7, !"frame-pointer", i32 2}
				!7 = !{!8, !10}
				!8 = !{!9, !"cold"}
				!9 = !{i64 6541423618768552252, i64 -6270142974039008131}
				!10 = !{!11, !"notcold"}
				!11 = !{i64 6541423618768552252, i64 -4903163940066524832}
				!12 = !{i64 6541423618768552252}
				!13 = !{i64 -4903163940066524832}
				!14 = !{i64 -6270142974039008131}
				!15 = !{i64 -6270142974039008131, i64 -184525619819294889}
				!16 = !{i64 -6270142974039008131, i64 1905834578520680781}
				!17 = !{i64 8632435727821051414}
				!18 = !{i64 -3421689549917153178}
				!19 = !{i64 6307901912192269588}


				;; After adding only the alloc node memprof metadata, we only have 2 contexts.

				; DUMP: CCG before updating call stack chains:
				; DUMP: Callsite Context Graph:
				; DUMP: Node [[D:0x[a-z0-9]+]]
				; DUMP: %call = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #6 (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 1 2
				; DUMP: CalleeEdges:
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[D]] to Caller: [[C:0x[a-z0-9]+]] AllocTypes: Cold ContextIds: 1
				; DUMP: Edge from Callee [[D]] to Caller: [[F:0x[a-z0-9]+]] AllocTypes: NotCold ContextIds: 2

				; DUMP: Node [[C]]
				; DUMP: null Call
				; DUMP: AllocTypes: Cold
				; DUMP: ContextIds: 1
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[D]] to Caller: [[C]] AllocTypes: Cold ContextIds: 1
				; DUMP: CallerEdges:

				; DUMP: Node [[F]]
				; DUMP: null Call
				; DUMP: AllocTypes: NotCold
				; DUMP: ContextIds: 2
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[D]] to Caller: [[F]] AllocTypes: NotCold ContextIds: 2
				; DUMP: CallerEdges:

				;; After updating for callsite metadata, we should have generated context ids 3 and 4,
				;; along with 2 new nodes for those callsites. All have the same allocation type
				;; behavior as the original C node.

				; DUMP: CCG before cloning:
				; DUMP: Callsite Context Graph:
				; DUMP: Node [[D]]
				; DUMP: %call = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #6 (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 1 2 3 4
				; DUMP: CalleeEdges:
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[D]] to Caller: [[F]] AllocTypes: NotCold ContextIds: 2
				; DUMP: Edge from Callee [[D]] to Caller: [[C2:0x[a-z0-9]+]] AllocTypes: Cold ContextIds: 3
				; DUMP: Edge from Callee [[D]] to Caller: [[B:0x[a-z0-9]+]] AllocTypes: Cold ContextIds: 4
				; DUMP: Edge from Callee [[D]] to Caller: [[E:0x[a-z0-9]+]] AllocTypes: Cold ContextIds: 1

				; DUMP: Node [[F]]
				; DUMP: %call = call noundef ptr @_Z1Dv() (clone 0)
				; DUMP: AllocTypes: NotCold
				; DUMP: ContextIds: 2
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[D]] to Caller: [[F]] AllocTypes: NotCold ContextIds: 2
				; DUMP: CallerEdges:

				; DUMP: Node [[C2]]
				; DUMP: %call = call noundef ptr @_Z1Dv() (clone 0)
				; DUMP: AllocTypes: Cold
				; DUMP: ContextIds: 3
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[D]] to Caller: [[C2]] AllocTypes: Cold ContextIds: 3
				; DUMP: CallerEdges:

				; DUMP: Node [[B]]
				; DUMP: %call.i = call noundef ptr @_Z1Dv() (clone 0)
				; DUMP: AllocTypes: Cold
				; DUMP: ContextIds: 4
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[D]] to Caller: [[B]] AllocTypes: Cold ContextIds: 4
				; DUMP: CallerEdges:

				; DUMP: Node [[E]]
				; DUMP: %call.i = call noundef ptr @_Z1Dv() (clone 0)
				; DUMP: AllocTypes: Cold
				; DUMP: ContextIds: 1
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[D]] to Caller: [[E]] AllocTypes: Cold ContextIds: 1
				; DUMP: CallerEdges:


				; DOTPRE: digraph CallsiteContextGraph {
				; DOTPRE: N[[D:0x[a-z0-9]+]] [shape="record",label="OrigId: Alloc0\n_Z1Dv -\> _Znam",tooltip="N[[D]] ContextIds: 1 2",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOTPRE: N[[F:0x[a-z0-9]+]] [shape="record",label="OrigId: 13543580133643026784\nnull call (external)",tooltip="N[[F]] ContextIds: 2",fillcolor="brown1",style="filled",style="filled"]; // callsite, default
				; DOTPRE: N[[C:0x[a-z0-9]+]] [shape="record",label="OrigId: 12176601099670543485\nnull call (external)",tooltip="N[[C]] ContextIds: 1",fillcolor="cyan",style="filled",style="filled"]; // callsite, cold
				; DOTPRE: // Edges:
				; DOTPRE: N[[C]] -> N[[D]][tooltip=" ContextIds: 1",fillcolor="cyan"]; // cold
				; DOTPRE: N[[F]] -> N[[D]][tooltip=" ContextIds: 2",fillcolor="brown1"]; // default
				; DOTPRE: }


				; DOTPOST: digraph CallsiteContextGraph {
				; DOTPOST: N[[D:0x[a-z0-9]+]] [shape="record",label="OrigId: Alloc0\n_Z1Dv -\> _Znam",tooltip="N[[D]] ContextIds: 1 2 3 4",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOTPOST: N[[E:0x[a-z0-9]+]] [shape="record",label="OrigId: 0\n_Z1Ev -\> _Z1Dv",tooltip="N[[E]] ContextIds: 1",fillcolor="cyan",style="filled",style="filled"]; // callsite, cold
				; DOTPOST: N[[B:0x[a-z0-9]+]] [shape="record",label="OrigId: 0\n_Z1Bv -\> _Z1Dv",tooltip="N[[B]] ContextIds: 4",fillcolor="cyan",style="filled",style="filled"]; // callsite, cold
				; DOTPOST: N[[C:0x[a-z0-9]+]] [shape="record",label="OrigId: 0\n_Z1Cv -\> _Z1Dv",tooltip="N[[C]] ContextIds: 3",fillcolor="cyan",style="filled",style="filled"]; // callsite, cold
				; DOTPOST: N[[F:0x[a-z0-9]+]] [shape="record",label="OrigId: 13543580133643026784\n_Z1Fv -\> _Z1Dv",tooltip="N[[F]] ContextIds: 2",fillcolor="brown1",style="filled",style="filled"]; // callsite, default
				; DOTPOST: // Edges:
				; DOTPOST: N[[F]] -> N[[D]][tooltip=" ContextIds: 2",fillcolor="brown1"]; // default
				; DOTPOST: N[[C]] -> N[[D]][tooltip=" ContextIds: 3",fillcolor="cyan"]; // cold
				; DOTPOST: N[[B]] -> N[[D]][tooltip=" ContextIds: 4",fillcolor="cyan"]; // cold
				; DOTPOST: N[[E]] -> N[[D]][tooltip=" ContextIds: 1",fillcolor="cyan"]; // cold
				; DOTPOST: }

llvm/test/Transforms/PGHOContextDisambiguation/indirectcall.ll

This file was added.

				;; Tests callsite context graph generation for call graph containing indirect
				;; calls. Currently this should result in conservative behavior, such that the
				;; indirect call receives a null call in its graph node, to prevent subsequent
				;; cloning.
				;;
				;; Original code looks like:
				;;
				;; char *foo() {
				;; return new char[10];
				;; }
				;; class A {
				;; public:
				;; virtual char *x() { return foo(); }
				;; };
				;; class B : public A {
				;; public:
				;; char *x() final { return foo(); }
				;; };
				;; char bar(A a) {
				;; return a->x();
				;; }
				;; int main(int argc, char **argv) {
				;; char *x = foo();
				;; char *y = foo();
				;; B b;
				;; char *z = bar(&b);
				;; char *w = bar(&b);
				;; A a;
				;; char *r = bar(&a);
				;; char *s = bar(&a);
				;; memset(x, 0, 10);
				;; memset(y, 0, 10);
				;; memset(z, 0, 10);
				;; memset(w, 0, 10);
				;; memset(r, 0, 10);
				;; memset(s, 0, 10);
				;; delete[] x;
				;; delete[] w;
				;; delete[] r;
				;; sleep(10);
				;; delete[] y;
				;; delete[] z;
				;; delete[] s;
				;; return 0;
				;; }
				;;
				;; Code compiled with -mllvm -memprof-min-lifetime-cold-threshold=5 so that the
				;; memory freed after sleep(10) results in cold lifetimes.
				;;
				;; Compiled without optimization to prevent inlining and devirtualization.

				; RUN: opt -passes=pgho-context-disambiguation \
				; RUN: -pgho-verify-ccg -pgho-verify-nodes -pgho-dump-ccg \
				; RUN: -pgho-export-to-dot -pgho-dot-file-path-prefix=%t. \
				; RUN: %s -S 2>&1 \| FileCheck %s --check-prefix=DUMP

				; RUN: cat %t.ccg.postbuild.dot \| FileCheck %s --check-prefix=DOT

				; ModuleID = 'indirectcall.ll'
				source_filename = "indirectcall.ll"
				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				%class.B = type { %class.A }
				%class.A = type { ptr }

				$_ZN1BC2Ev = comdat any

				$_ZN1AC2Ev = comdat any

				$_ZN1A1xEv = comdat any

				$_ZN1B1xEv = comdat any

				$_ZTV1B = comdat any

				$_ZTS1B = comdat any

				$_ZTS1A = comdat any

				$_ZTI1A = comdat any

				$_ZTI1B = comdat any

				$_ZTV1A = comdat any

				@_ZTV1B = internal unnamed_addr constant { [3 x ptr] } { [3 x ptr] [ptr null, ptr @_ZTI1B, ptr @_ZN1B1xEv] }, comdat, align 8, !type !0, !type !1, !type !2, !type !3
				@_ZTVN10__cxxabiv120__si_class_type_infoE = external global ptr
				@_ZTS1B = internal constant [3 x i8] c"1B\00", comdat, align 1
				@_ZTVN10__cxxabiv117__class_type_infoE = external global ptr
				@_ZTS1A = internal constant [3 x i8] c"1A\00", comdat, align 1
				@_ZTI1A = internal constant { ptr, ptr } { ptr getelementptr inbounds (ptr, ptr @_ZTVN10__cxxabiv117__class_type_infoE, i64 2), ptr @_ZTS1A }, comdat, align 8
				@_ZTI1B = internal constant { ptr, ptr, ptr } { ptr getelementptr inbounds (ptr, ptr @_ZTVN10__cxxabiv120__si_class_type_infoE, i64 2), ptr @_ZTS1B, ptr @_ZTI1A }, comdat, align 8
				@_ZTV1A = internal unnamed_addr constant { [3 x ptr] } { [3 x ptr] [ptr null, ptr @_ZTI1A, ptr @_ZN1A1xEv] }, comdat, align 8, !type !0, !type !1

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z3barP1A(ptr noundef %a) #0 {
				entry:
				%a.addr = alloca ptr, align 8
				store ptr %a, ptr %a.addr, align 8
				%0 = load ptr, ptr %a.addr, align 8
				%vtable = load ptr, ptr %0, align 8
				%vfn = getelementptr inbounds ptr, ptr %vtable, i64 0
				%1 = load ptr, ptr %vfn, align 8
				%call = call noundef ptr %1(ptr noundef nonnull align 8 dereferenceable(8) %0), !callsite !11
				ret ptr %call
				}

				; Function Attrs: mustprogress noinline norecurse optnone uwtable
				define dso_local noundef i32 @main(i32 noundef %argc, ptr noundef %argv) #1 {
				entry:
				%retval = alloca i32, align 4
				%argc.addr = alloca i32, align 4
				%argv.addr = alloca ptr, align 8
				%x = alloca ptr, align 8
				%y = alloca ptr, align 8
				%b = alloca %class.B, align 8
				%z = alloca ptr, align 8
				%w = alloca ptr, align 8
				%a = alloca %class.A, align 8
				%r = alloca ptr, align 8
				%s = alloca ptr, align 8
				store i32 0, ptr %retval, align 4
				store i32 %argc, ptr %argc.addr, align 4
				store ptr %argv, ptr %argv.addr, align 8
				%call = call noundef ptr @_Z3foov(), !callsite !12
				store ptr %call, ptr %x, align 8
				%call1 = call noundef ptr @_Z3foov(), !callsite !13
				store ptr %call1, ptr %y, align 8
				call void @_ZN1BC2Ev(ptr noundef nonnull align 8 dereferenceable(8) %b) #7
				%call2 = call noundef ptr @_Z3barP1A(ptr noundef %b), !callsite !14
				store ptr %call2, ptr %z, align 8
				%call3 = call noundef ptr @_Z3barP1A(ptr noundef %b), !callsite !15
				store ptr %call3, ptr %w, align 8
				call void @_ZN1AC2Ev(ptr noundef nonnull align 8 dereferenceable(8) %a) #7
				%call4 = call noundef ptr @_Z3barP1A(ptr noundef %a), !callsite !16
				store ptr %call4, ptr %r, align 8
				%call5 = call noundef ptr @_Z3barP1A(ptr noundef %a), !callsite !17
				store ptr %call5, ptr %s, align 8
				%0 = load ptr, ptr %x, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %0, i8 0, i64 10, i1 false)
				%1 = load ptr, ptr %y, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %1, i8 0, i64 10, i1 false)
				%2 = load ptr, ptr %z, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %2, i8 0, i64 10, i1 false)
				%3 = load ptr, ptr %w, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %3, i8 0, i64 10, i1 false)
				%4 = load ptr, ptr %r, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %4, i8 0, i64 10, i1 false)
				%5 = load ptr, ptr %s, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %5, i8 0, i64 10, i1 false)
				%6 = load ptr, ptr %x, align 8
				%isnull = icmp eq ptr %6, null
				br i1 %isnull, label %delete.end, label %delete.notnull

				delete.notnull: ; preds = %entry
				call void @_ZdaPv(ptr noundef %6) #8
				br label %delete.end

				delete.end: ; preds = %delete.notnull, %entry
				%7 = load ptr, ptr %w, align 8
				%isnull6 = icmp eq ptr %7, null
				br i1 %isnull6, label %delete.end8, label %delete.notnull7

				delete.notnull7: ; preds = %delete.end
				call void @_ZdaPv(ptr noundef %7) #8
				br label %delete.end8

				delete.end8: ; preds = %delete.notnull7, %delete.end
				%8 = load ptr, ptr %r, align 8
				%isnull9 = icmp eq ptr %8, null
				br i1 %isnull9, label %delete.end11, label %delete.notnull10

				delete.notnull10: ; preds = %delete.end8
				call void @_ZdaPv(ptr noundef %8) #8
				br label %delete.end11

				delete.end11: ; preds = %delete.notnull10, %delete.end8
				%call12 = call i32 @sleep(i32 noundef 10)
				%9 = load ptr, ptr %y, align 8
				%isnull13 = icmp eq ptr %9, null
				br i1 %isnull13, label %delete.end15, label %delete.notnull14

				delete.notnull14: ; preds = %delete.end11
				call void @_ZdaPv(ptr noundef %9) #8
				br label %delete.end15

				delete.end15: ; preds = %delete.notnull14, %delete.end11
				%10 = load ptr, ptr %z, align 8
				%isnull16 = icmp eq ptr %10, null
				br i1 %isnull16, label %delete.end18, label %delete.notnull17

				delete.notnull17: ; preds = %delete.end15
				call void @_ZdaPv(ptr noundef %10) #8
				br label %delete.end18

				delete.end18: ; preds = %delete.notnull17, %delete.end15
				%11 = load ptr, ptr %s, align 8
				%isnull19 = icmp eq ptr %11, null
				br i1 %isnull19, label %delete.end21, label %delete.notnull20

				delete.notnull20: ; preds = %delete.end18
				call void @_ZdaPv(ptr noundef %11) #8
				br label %delete.end21

				delete.end21: ; preds = %delete.notnull20, %delete.end18
				ret i32 0
				}

				; Function Attrs: noinline nounwind optnone uwtable
				define internal void @_ZN1BC2Ev(ptr noundef nonnull align 8 dereferenceable(8) %this) unnamed_addr #2 comdat align 2 {
				entry:
				%this.addr = alloca ptr, align 8
				store ptr %this, ptr %this.addr, align 8
				%this1 = load ptr, ptr %this.addr, align 8
				call void @_ZN1AC2Ev(ptr noundef nonnull align 8 dereferenceable(8) %this1) #7
				store ptr getelementptr inbounds ({ [3 x ptr] }, ptr @_ZTV1B, i32 0, inrange i32 0, i32 2), ptr %this1, align 8
				ret void
				}

				; Function Attrs: noinline nounwind optnone uwtable
				define internal void @_ZN1AC2Ev(ptr noundef nonnull align 8 dereferenceable(8) %this) unnamed_addr #2 comdat align 2 {
				entry:
				%this.addr = alloca ptr, align 8
				store ptr %this, ptr %this.addr, align 8
				%this1 = load ptr, ptr %this.addr, align 8
				store ptr getelementptr inbounds ({ [3 x ptr] }, ptr @_ZTV1A, i32 0, inrange i32 0, i32 2), ptr %this1, align 8
				ret void
				}

				; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
				declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #3

				; Function Attrs: nobuiltin nounwind
				declare void @_ZdaPv(ptr noundef) #4

				declare i32 @sleep(i32 noundef) #5

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_ZN1A1xEv(ptr noundef nonnull align 8 dereferenceable(8) %this) unnamed_addr #0 comdat align 2 {
				entry:
				%this.addr = alloca ptr, align 8
				store ptr %this, ptr %this.addr, align 8
				%this1 = load ptr, ptr %this.addr, align 8
				%call = call noundef ptr @_Z3foov(), !callsite !18
				ret ptr %call
				}

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_ZN1B1xEv(ptr noundef nonnull align 8 dereferenceable(8) %this) unnamed_addr #0 comdat align 2 {
				entry:
				%this.addr = alloca ptr, align 8
				store ptr %this, ptr %this.addr, align 8
				%this1 = load ptr, ptr %this.addr, align 8
				%call = call noundef ptr @_Z3foov(), !callsite !19
				ret ptr %call
				}

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z3foov() #0 {
				entry:
				%call = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #9, !memprof !20, !callsite !33
				ret ptr %call
				}

				; Function Attrs: nobuiltin allocsize(0)
				declare noundef nonnull ptr @_Znam(i64 noundef) #6

				attributes #0 = { mustprogress noinline optnone uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #1 = { mustprogress noinline norecurse optnone uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #2 = { noinline nounwind optnone uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #3 = { nocallback nofree nounwind willreturn memory(argmem: write) }
				attributes #4 = { nobuiltin nounwind "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #5 = { "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #6 = { nobuiltin allocsize(0) "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #7 = { nounwind }
				attributes #8 = { builtin nounwind }
				attributes #9 = { builtin allocsize(0) }

				!llvm.module.flags = !{!4, !5, !6, !7, !8, !9, !10}

				!0 = !{i64 16, !"_ZTS1A"}
				!1 = !{i64 16, !"_ZTSM1AFPcvE.virtual"}
				!2 = !{i64 16, !"_ZTS1B"}
				!3 = !{i64 16, !"_ZTSM1BFPcvE.virtual"}
				!4 = !{i32 7, !"Dwarf Version", i32 5}
				!5 = !{i32 2, !"Debug Info Version", i32 3}
				!6 = !{i32 1, !"wchar_size", i32 4}
				!7 = !{i32 8, !"PIC Level", i32 2}
				!8 = !{i32 7, !"PIE Level", i32 2}
				!9 = !{i32 7, !"uwtable", i32 2}
				!10 = !{i32 7, !"frame-pointer", i32 2}
				!11 = !{i64 -4820244510750103755}
				!12 = !{i64 8632435727821051414}
				!13 = !{i64 -3421689549917153178}
				!14 = !{i64 6792096022461663180}
				!15 = !{i64 -2709642582978494015}
				!16 = !{i64 748269490701775343}
				!17 = !{i64 -5747251260480066785}
				!18 = !{i64 8256774051149711748}
				!19 = !{i64 -4831879094954754638}
				!20 = !{!21, !23, !25, !27, !29, !31}
				!21 = !{!22, !"notcold"}
				!22 = !{i64 2732490490862098848, i64 8256774051149711748, i64 -4820244510750103755, i64 748269490701775343}
				!23 = !{!24, !"cold"}
				!24 = !{i64 2732490490862098848, i64 8256774051149711748, i64 -4820244510750103755, i64 -5747251260480066785}
				!25 = !{!26, !"notcold"}
				!26 = !{i64 2732490490862098848, i64 8632435727821051414}
				!27 = !{!28, !"cold"}
				!28 = !{i64 2732490490862098848, i64 -4831879094954754638, i64 -4820244510750103755, i64 6792096022461663180}
				!29 = !{!30, !"notcold"}
				!30 = !{i64 2732490490862098848, i64 -4831879094954754638, i64 -4820244510750103755, i64 -2709642582978494015}
				!31 = !{!32, !"cold"}
				!32 = !{i64 2732490490862098848, i64 -3421689549917153178}
				!33 = !{i64 2732490490862098848}


				; DUMP: CCG before cloning:
				; DUMP: Callsite Context Graph:
				; DUMP: Node [[FOO:0x[a-z0-9]+]]
				; DUMP: %call = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #7 (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 1 2 3 4 5 6
				; DUMP: CalleeEdges:
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[FOO]] to Caller: [[AX:0x[a-z0-9]+]] AllocTypes: NotColdCold ContextIds: 1 2
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN1:0x[a-z0-9]+]] AllocTypes: NotCold ContextIds: 3
				; DUMP: Edge from Callee [[FOO]] to Caller: [[BX:0x[a-z0-9]+]] AllocTypes: NotColdCold ContextIds: 4 5
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN2:0x[a-z0-9]+]] AllocTypes: Cold ContextIds: 6

				; DUMP: Node [[AX]]
				; DUMP: %call = call noundef ptr @_Z3foov() (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 1 2
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[FOO]] to Caller: [[AX]] AllocTypes: NotColdCold ContextIds: 1 2
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[AX]] to Caller: [[BAR:0x[a-z0-9]+]] AllocTypes: NotColdCold ContextIds: 1 2

				;; Bar contains an indirect call, with multiple targets. It's call should be null.
				; DUMP: Node [[BAR]]
				; DUMP: null Call
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 1 2 4 5
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[AX]] to Caller: [[BAR]] AllocTypes: NotColdCold ContextIds: 1 2
				; DUMP: Edge from Callee [[BX]] to Caller: [[BAR]] AllocTypes: NotColdCold ContextIds: 4 5
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[BAR]] to Caller: [[MAIN3:0x[a-z0-9]+]] AllocTypes: NotCold ContextIds: 1
				; DUMP: Edge from Callee [[BAR]] to Caller: [[MAIN4:0x[a-z0-9]+]] AllocTypes: Cold ContextIds: 2
				; DUMP: Edge from Callee [[BAR]] to Caller: [[MAIN5:0x[a-z0-9]+]] AllocTypes: Cold ContextIds: 4
				; DUMP: Edge from Callee [[BAR]] to Caller: [[MAIN6:0x[a-z0-9]+]] AllocTypes: NotCold ContextIds: 5

				; DUMP: Node [[MAIN3]]
				; DUMP: %call4 = call noundef ptr @_Z3barP1A(ptr noundef %a) (clone 0)
				; DUMP: AllocTypes: NotCold
				; DUMP: ContextIds: 1
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[BAR]] to Caller: [[MAIN3]] AllocTypes: NotCold ContextIds: 1
				; DUMP: CallerEdges:

				; DUMP: Node [[MAIN4]]
				; DUMP: %call5 = call noundef ptr @_Z3barP1A(ptr noundef %a) (clone 0)
				; DUMP: AllocTypes: Cold
				; DUMP: ContextIds: 2
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[BAR]] to Caller: [[MAIN4]] AllocTypes: Cold ContextIds: 2
				; DUMP: CallerEdges:

				; DUMP: Node [[MAIN5]]
				; DUMP: %call2 = call noundef ptr @_Z3barP1A(ptr noundef %b) (clone 0)
				; DUMP: AllocTypes: Cold
				; DUMP: ContextIds: 4
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[BAR]] to Caller: [[MAIN5]] AllocTypes: Cold ContextIds: 4
				; DUMP: CallerEdges:

				; DUMP: Node [[MAIN6]]
				; DUMP: %call3 = call noundef ptr @_Z3barP1A(ptr noundef %b) (clone 0)
				; DUMP: AllocTypes: NotCold
				; DUMP: ContextIds: 5
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[BAR]] to Caller: [[MAIN6]] AllocTypes: NotCold ContextIds: 5
				; DUMP: CallerEdges:

				; DUMP: Node [[MAIN1]]
				; DUMP: %call = call noundef ptr @_Z3foov() (clone 0)
				; DUMP: AllocTypes: NotCold
				; DUMP: ContextIds: 3
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN1]] AllocTypes: NotCold ContextIds: 3
				; DUMP: CallerEdges:

				; DUMP: Node [[BX]]
				; DUMP: %call = call noundef ptr @_Z3foov() (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 4 5
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[FOO]] to Caller: [[BX]] AllocTypes: NotColdCold ContextIds: 4 5
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[BX]] to Caller: [[BAR]] AllocTypes: NotColdCold ContextIds: 4 5

				; DUMP: Node [[MAIN2]]
				; DUMP: %call1 = call noundef ptr @_Z3foov() (clone 0)
				; DUMP: AllocTypes: Cold
				; DUMP: ContextIds: 6
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN2]] AllocTypes: Cold ContextIds: 6
				; DUMP: CallerEdges:


				; DOT: digraph CallsiteContextGraph {
				; DOT: N[[FOO:0x[a-z0-9]+]] [shape="record",label="OrigId: Alloc0\n_Z3foov -\> _Znam",tooltip="N[[FOO]] ContextIds: 1 2 3 4 5 6",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOT: N[[MAIN1:0x[a-z0-9]+]] [shape="record",label="OrigId: 15025054523792398438\nmain -\> _Z3foov",tooltip="N[[MAIN1]] ContextIds: 6",fillcolor="cyan",style="filled",style="filled"]; // callsite, cold
				; DOT: N[[BX:0x[a-z0-9]+]] [shape="record",label="OrigId: 13614864978754796978\n_ZN1B1xEv -\> _Z3foov",tooltip="N[[BX]] ContextIds: 4 5",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				;; Bar contains an indirect call, with multiple targets. It's call should be null.
				; DOT: N[[BAR:0x[a-z0-9]+]] [shape="record",label="OrigId: 13626499562959447861\nnull call (external)",tooltip="N[[BAR]] ContextIds: 1 2 4 5",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOT: N[[MAIN2:0x[a-z0-9]+]] [shape="record",label="OrigId: 15737101490731057601\nmain -\> _Z3barP1A",tooltip="N[[MAIN2]] ContextIds: 5",fillcolor="brown1",style="filled",style="filled"]; // callsite, default
				; DOT: N[[MAIN3:0x[a-z0-9]+]] [shape="record",label="OrigId: 6792096022461663180\nmain -\> _Z3barP1A",tooltip="N[[MAIN3]] ContextIds: 4",fillcolor="cyan",style="filled",style="filled"]; // callsite, cold
				; DOT: N[[MAIN4:0x[a-z0-9]+]] [shape="record",label="OrigId: 12699492813229484831\nmain -\> _Z3barP1A",tooltip="N[[MAIN4]] ContextIds: 2",fillcolor="cyan",style="filled",style="filled"]; // callsite, cold
				; DOT: N[[MAIN5:0x[a-z0-9]+]] [shape="record",label="OrigId: 748269490701775343\nmain -\> _Z3barP1A",tooltip="N[[MAIN5]] ContextIds: 1",fillcolor="brown1",style="filled",style="filled"]; // callsite, default
				; DOT: N[[MAIN6:0x[a-z0-9]+]] [shape="record",label="OrigId: 8632435727821051414\nmain -\> _Z3foov",tooltip="N[[MAIN6]] ContextIds: 3",fillcolor="brown1",style="filled",style="filled"]; // callsite, default
				; DOT: N[[AX:0x[a-z0-9]+]] [shape="record",label="OrigId: 8256774051149711748\n_ZN1A1xEv -\> _Z3foov",tooltip="N[[AX]] ContextIds: 1 2",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOT: // Edges:
				; DOT: N[[AX]] -> N[[FOO]][tooltip=" ContextIds: 1 2",fillcolor="mediumorchid1"]; // default\|cold
				; DOT: N[[MAIN6]] -> N[[FOO]][tooltip=" ContextIds: 3",fillcolor="brown1"]; // default
				; DOT: N[[BX]] -> N[[FOO]][tooltip=" ContextIds: 4 5",fillcolor="mediumorchid1"]; // default\|cold
				; DOT: N[[MAIN1]] -> N[[FOO]][tooltip=" ContextIds: 6",fillcolor="cyan"]; // cold
				; DOT: N[[BAR]] -> N[[BX]][tooltip=" ContextIds: 4 5",fillcolor="mediumorchid1"]; // default\|cold
				; DOT: N[[MAIN5]] -> N[[BAR]][tooltip=" ContextIds: 1",fillcolor="brown1"]; // default
				; DOT: N[[MAIN4]] -> N[[BAR]][tooltip=" ContextIds: 2",fillcolor="cyan"]; // cold
				; DOT: N[[MAIN3]] -> N[[BAR]][tooltip=" ContextIds: 4",fillcolor="cyan"]; // cold
				; DOT: N[[MAIN2]] -> N[[BAR]][tooltip=" ContextIds: 5",fillcolor="brown1"]; // default
				; DOT: N[[BAR]] -> N[[AX]][tooltip=" ContextIds: 1 2",fillcolor="mediumorchid1"]; // default\|cold
				; DOT: }

llvm/test/Transforms/PGHOContextDisambiguation/inlined.ll

This file was added.

				;; Test callsite context graph generation for call graph with two memprof
				;; contexts and partial inlining, requiring generation of a new fused node to
				;; represent the inlined sequence while matching callsite nodes onto the graph.
				;;
				;; Original code looks like:
				;;
				;; char *bar() {
				;; return new char[10];
				;; }
				;;
				;; char *baz() {
				;; return bar();
				;; }
				;;
				;; char *foo() {
				;; return baz();
				;; }
				;;
				;; int main(int argc, char **argv) {
				;; char *x = foo();
				;; char *y = foo();
				;; memset(x, 0, 10);
				;; memset(y, 0, 10);
				;; delete[] x;
				;; sleep(10);
				;; delete[] y;
				;; return 0;
				;; }
				;;
				;; Code compiled with -mllvm -memprof-min-lifetime-cold-threshold=5 so that the
				;; memory freed after sleep(10) results in cold lifetimes.
				;;
				;; The code below was created by forcing inlining of baz into foo, and
				;; bar into baz. Due to the inlining of bar we will initially have two
				;; allocation nodes in the graph. This tests that we correctly match
				;; foo (with baz inlined) onto the graph nodes first, and generate a new
				;; fused node for it. We should then not match baz (with bar inlined) as that
				;; is not reached by the MIB contexts (since all calls from main will look
				;; like main -> foo(+baz) -> bar after the inlining reflected in this IR).

				; RUN: opt -passes=pgho-context-disambiguation \
				; RUN: -pgho-verify-ccg -pgho-verify-nodes -pgho-dump-ccg \
				; RUN: -pgho-export-to-dot -pgho-dot-file-path-prefix=%t. \
				; RUN: %s -S 2>&1 \| FileCheck %s --check-prefix=DUMP

				; RUN: cat %t.ccg.postbuild.dot \| FileCheck %s --check-prefix=DOT

				; ModuleID = 'inlined.ll'
				source_filename = "inlined.ll"
				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: mustprogress uwtable
				define internal noundef ptr @_Z3barv() #0 {
				entry:
				%call = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #7, !memprof !7, !callsite !12
				ret ptr %call
				}

				; Function Attrs: nobuiltin allocsize(0)
				declare noundef nonnull ptr @_Znam(i64 noundef) #1

				; Function Attrs: mustprogress uwtable
				define internal noundef ptr @_Z3bazv() #0 {
				entry:
				%call.i = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #7, !memprof !7, !callsite !13
				ret ptr %call.i
				}

				; Function Attrs: mustprogress noinline optnone uwtable
				define internal noundef ptr @_Z3foov() #2 {
				entry:
				%call.i = call noundef ptr @_Z3barv(), !callsite !14
				ret ptr %call.i
				}

				; Function Attrs: mustprogress noinline norecurse optnone uwtable
				define dso_local noundef i32 @main(i32 noundef %argc, ptr noundef %argv) #3 {
				entry:
				%retval = alloca i32, align 4
				%argc.addr = alloca i32, align 4
				%argv.addr = alloca ptr, align 8
				%x = alloca ptr, align 8
				%y = alloca ptr, align 8
				store i32 0, ptr %retval, align 4
				store i32 %argc, ptr %argc.addr, align 4
				store ptr %argv, ptr %argv.addr, align 8
				%call = call noundef ptr @_Z3foov(), !callsite !15
				store ptr %call, ptr %x, align 8
				%call1 = call noundef ptr @_Z3foov(), !callsite !16
				store ptr %call1, ptr %y, align 8
				%0 = load ptr, ptr %x, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %0, i8 0, i64 10, i1 false)
				%1 = load ptr, ptr %y, align 8
				call void @llvm.memset.p0.i64(ptr align 1 %1, i8 0, i64 10, i1 false)
				%2 = load ptr, ptr %x, align 8
				%isnull = icmp eq ptr %2, null
				br i1 %isnull, label %delete.end, label %delete.notnull

				delete.notnull: ; preds = %entry
				call void @_ZdaPv(ptr noundef %2) #8
				br label %delete.end

				delete.end: ; preds = %delete.notnull, %entry
				%call2 = call i32 @sleep(i32 noundef 10)
				%3 = load ptr, ptr %y, align 8
				%isnull3 = icmp eq ptr %3, null
				br i1 %isnull3, label %delete.end5, label %delete.notnull4

				delete.notnull4: ; preds = %delete.end
				call void @_ZdaPv(ptr noundef %3) #8
				br label %delete.end5

				delete.end5: ; preds = %delete.notnull4, %delete.end
				ret i32 0
				}

				; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
				declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #4

				; Function Attrs: nobuiltin nounwind
				declare void @_ZdaPv(ptr noundef) #5

				declare i32 @sleep(i32 noundef) #6

				attributes #0 = { mustprogress uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #1 = { nobuiltin allocsize(0) "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #2 = { mustprogress noinline optnone uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #3 = { mustprogress noinline norecurse optnone uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #4 = { nocallback nofree nounwind willreturn memory(argmem: write) }
				attributes #5 = { nobuiltin nounwind "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #6 = { "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #7 = { builtin allocsize(0) }
				attributes #8 = { builtin nounwind }

				!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6}

				!0 = !{i32 7, !"Dwarf Version", i32 5}
				!1 = !{i32 2, !"Debug Info Version", i32 3}
				!2 = !{i32 1, !"wchar_size", i32 4}
				!3 = !{i32 8, !"PIC Level", i32 2}
				!4 = !{i32 7, !"PIE Level", i32 2}
				!5 = !{i32 7, !"uwtable", i32 2}
				!6 = !{i32 7, !"frame-pointer", i32 2}
				!7 = !{!8, !10}
				!8 = !{!9, !"notcold"}
				!9 = !{i64 9086428284934609951, i64 -5964873800580613432, i64 2732490490862098848, i64 8632435727821051414}
				!10 = !{!11, !"cold"}
				!11 = !{i64 9086428284934609951, i64 -5964873800580613432, i64 2732490490862098848, i64 -3421689549917153178}
				!12 = !{i64 9086428284934609951}
				!13 = !{i64 9086428284934609951, i64 -5964873800580613432}
				!14 = !{i64 -5964873800580613432, i64 2732490490862098848}
				!15 = !{i64 8632435727821051414}
				!16 = !{i64 -3421689549917153178}


				; DUMP: CCG before cloning:
				; DUMP: Callsite Context Graph:
				; DUMP: Node [[BAR:0x[a-z0-9]+]]
				; DUMP: %call = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #7 (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 1 2
				; DUMP: CalleeEdges:
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[BAR]] to Caller: [[FOO:0x[a-z0-9]+]] AllocTypes: NotColdCold ContextIds: 1 2

				;; This is the node synthesized for the call to bar in foo that was created
				;; by inlining baz into foo.
				; DUMP: Node [[FOO]]
				; DUMP: %call.i = call noundef ptr @_Z3barv() (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 1 2
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[BAR]] to Caller: [[FOO]] AllocTypes: NotColdCold ContextIds: 1 2
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN1:0x[a-z0-9]+]] AllocTypes: NotCold ContextIds: 1
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN2:0x[a-z0-9]+]] AllocTypes: Cold ContextIds: 2

				; DUMP: Node [[MAIN1]]
				; DUMP: %call = call noundef ptr @_Z3foov() (clone 0)
				; DUMP: AllocTypes: NotCold
				; DUMP: ContextIds: 1 3
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[FOO2:0x[a-z0-9]+]] to Caller: [[MAIN1]] AllocTypes: NotCold ContextIds: 3
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN1]] AllocTypes: NotCold ContextIds: 1
				; DUMP: CallerEdges:

				; DUMP: Node [[MAIN2]]
				; DUMP: %call1 = call noundef ptr @_Z3foov() (clone 0)
				; DUMP: AllocTypes: Cold
				; DUMP: ContextIds: 2 4
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[FOO2]] to Caller: [[MAIN2]] AllocTypes: Cold ContextIds: 4
				; DUMP: Edge from Callee [[FOO]] to Caller: [[MAIN2]] AllocTypes: Cold ContextIds: 2
				; DUMP: CallerEdges:

				; DUMP: Node [[BAZ:0x[a-z0-9]+]]
				; DUMP: %call.i = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #7 (clone 0)
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 3 4
				; DUMP: CalleeEdges:
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[BAZ]] to Caller: [[FOO2]] AllocTypes: NotColdCold ContextIds: 3 4

				;; This is leftover from the MIB on the alloc inlined into baz. It is not
				;; matched with any call, since there is no such node in the IR. Due to the
				;; null call it will not participate in any context transformations.
				; DUMP: Node [[FOO2]]
				; DUMP: null Call
				; DUMP: AllocTypes: NotColdCold
				; DUMP: ContextIds: 3 4
				; DUMP: CalleeEdges:
				; DUMP: Edge from Callee [[BAZ]] to Caller: [[FOO2]] AllocTypes: NotColdCold ContextIds: 3 4
				; DUMP: CallerEdges:
				; DUMP: Edge from Callee [[FOO2]] to Caller: [[MAIN1]] AllocTypes: NotCold ContextIds: 3
				; DUMP: Edge from Callee [[FOO2]] to Caller: [[MAIN2]] AllocTypes: Cold ContextIds: 4


				; DOT: digraph CallsiteContextGraph {
				; DOT: N[[BAZ:0x[a-z0-9]+]] [shape="record",label="OrigId: Alloc2\n_Z3bazv -\> _Znam",tooltip="N[[BAZ]] ContextIds: 3 4",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOT: N[[FOO2:0x[a-z0-9]+]] [shape="record",label="OrigId: 2732490490862098848\nnull call (external)",tooltip="N[[FOO2]] ContextIds: 3 4",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOT: N[[MAIN2:0x[a-z0-9]+]] [shape="record",label="OrigId: 15025054523792398438\nmain -\> _Z3foov",tooltip="N[[MAIN2]] ContextIds: 2 4",fillcolor="cyan",style="filled",style="filled"]; // callsite, cold
				; DOT: N[[MAIN1:0x[a-z0-9]+]] [shape="record",label="OrigId: 8632435727821051414\nmain -\> _Z3foov",tooltip="N[[MAIN1]] ContextIds: 1 3",fillcolor="brown1",style="filled",style="filled"]; // callsite, default
				; DOT: N[[BAR:0x[a-z0-9]+]] [shape="record",label="OrigId: Alloc0\n_Z3barv -\> _Znam",tooltip="N[[BAR]] ContextIds: 1 2",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOT: N[[FOO:0x[a-z0-9]+]] [shape="record",label="OrigId: 0\n_Z3foov -\> _Z3barv",tooltip="N[[FOO]] ContextIds: 1 2",fillcolor="mediumorchid1",style="filled",style="filled"]; // callsite, default\|cold
				; DOT: // Edges:
				; DOT: N[[FOO2]] -> N[[BAZ]][tooltip=" ContextIds: 3 4",fillcolor="mediumorchid1"]; // default\|cold
				; DOT: N[[MAIN1]] -> N[[FOO2]][tooltip=" ContextIds: 3",fillcolor="brown1"]; // default
				; DOT: N[[MAIN2]] -> N[[FOO2]][tooltip=" ContextIds: 4",fillcolor="cyan"]; // cold
				; DOT: N[[FOO]] -> N[[BAR]][tooltip=" ContextIds: 1 2",fillcolor="mediumorchid1"]; // default\|cold
				; DOT: N[[MAIN1]] -> N[[FOO]][tooltip=" ContextIds: 1",fillcolor="brown1"]; // default
				; DOT: N[[MAIN2]] -> N[[FOO]][tooltip=" ContextIds: 2",fillcolor="cyan"]; // cold
				; DOT: }

llvm/test/Transforms/PGHOContextDisambiguation/pass-pipeline.ll

This file was added.

				;; Test that PGHOContextDisambiguation is enabled under the expected conditions
				;; and in the expected position.

				;; Pass is not currently enabled by default at any opt level.
				; RUN: opt -debug-pass-manager -passes='lto<O0>' -S %s \
				; RUN: 2>&1 \| FileCheck %s --implicit-check-not="Running pass: PGHOContextDisambiguation"
				; RUN: opt -debug-pass-manager -passes='lto<O1>' -S %s \
				; RUN: 2>&1 \| FileCheck %s --implicit-check-not="Running pass: PGHOContextDisambiguation"
				; RUN: opt -debug-pass-manager -passes='lto<O2>' -S %s \
				; RUN: 2>&1 \| FileCheck %s --implicit-check-not="Running pass: PGHOContextDisambiguation"
				; RUN: opt -debug-pass-manager -passes='lto<O3>' -S %s \
				; RUN: 2>&1 \| FileCheck %s --implicit-check-not="Running pass: PGHOContextDisambiguation"

				;; Pass should not run even under option at O0/O1.
				; RUN: opt -debug-pass-manager -passes='lto<O0>' -S %s \
				; RUN: -enable-pgho-context-disambiguation \
				; RUN: 2>&1 \| FileCheck %s --implicit-check-not="Running pass: PGHOContextDisambiguation"
				; RUN: opt -debug-pass-manager -passes='lto<O1>' -S %s \
				; RUN: -enable-pgho-context-disambiguation \
				; RUN: 2>&1 \| FileCheck %s --implicit-check-not="Running pass: PGHOContextDisambiguation"

				;; Pass should be enabled under option at O2/O3.
				; RUN: opt -debug-pass-manager -passes='lto<O2>' -S %s \
				; RUN: -enable-pgho-context-disambiguation \
				; RUN: 2>&1 \| FileCheck %s --check-prefix=ENABLED
				; RUN: opt -debug-pass-manager -passes='lto<O3>' -S %s \
				; RUN: -enable-pgho-context-disambiguation \
				; RUN: 2>&1 \| FileCheck %s --check-prefix=ENABLED

				;; When enabled, PGHOContextDisambiguation runs just after inlining.
				; ENABLED: Running pass: InlinerPass
				; ENABLED: Invalidating analysis: InlineAdvisorAnalysis
				; ENABLED: Running pass: PGHOContextDisambiguation

				define noundef ptr @_Z3barv() {
				entry:
				%call = call noalias noundef nonnull ptr @_Znam(i64 noundef 10)
				ret ptr %call
				}

				declare noundef nonnull ptr @_Znam(i64 noundef)

This is an archive of the discontinued LLVM Phabricator instance.

[MemProf] Context disambiguation cloning pass [patch 1a/3]ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 500320

llvm/include/llvm/IR/ModuleSummaryIndex.h

llvm/include/llvm/Transforms/IPO/PGHOContextDisambiguation.h

llvm/lib/LTO/LTO.cpp

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Passes/PassBuilderPipelines.cpp

llvm/lib/Passes/PassRegistry.def

llvm/lib/Transforms/IPO/CMakeLists.txt

llvm/lib/Transforms/IPO/PGHOContextDisambiguation.cpp

llvm/test/ThinLTO/X86/pgho-basic.ll

llvm/test/ThinLTO/X86/pgho-duplicate-context-ids.ll

llvm/test/ThinLTO/X86/pgho-indirectcall.ll

llvm/test/ThinLTO/X86/pgho-inlined.ll

llvm/test/Transforms/PGHOContextDisambiguation/basic.ll

llvm/test/Transforms/PGHOContextDisambiguation/duplicate-context-ids.ll

llvm/test/Transforms/PGHOContextDisambiguation/indirectcall.ll

llvm/test/Transforms/PGHOContextDisambiguation/inlined.ll

llvm/test/Transforms/PGHOContextDisambiguation/pass-pipeline.ll

[MemProf] Context disambiguation cloning pass [patch 1a/3]
ClosedPublic