This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/IPO/
-
llvm/
-
Transforms/
-
IPO/
-
SampleContextTracker.h
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
-
SampleContextTracker.cpp
2/4
SampleProfile.cpp
-
test/Transforms/SampleProfile/
-
Transforms/
-
SampleProfile/
-
Inputs/
-
profile-context-order.prof
-
profile-topdown-order.prof
-
profile-context-order.ll
-
profile-topdown-order.ll

Differential D95988

[CSSPGO] Process functions in a top-down order on a dynamic call graph.
ClosedPublic

Authored by hoy on Feb 3 2021, 5:00 PM.

Download Raw Diff

Details

Reviewers

wmi
davidxl
wenlei

Commits

rGde40f6d6230e: [CSSPGO] Process functions in a top-down order on a dynamic call graph.

Summary

Functions are currently processed by the sample profiler loader in a top-down order defined by the static call graph. The order is being adjusted to be a top-down order based on the input context-sensitive profile. One benefit is that the processing order of caller and callee in one SCC would follow the context order in the profile to favor more inlining. Another benefit is that the processing order of caller and callee through an indirect call (which is not on the static call graph) can be honored which in turn allows for more inlining.

Two switches -mllvm -use-profile-indirect-call-edges and -mllvm -use-profile-top-down-order are being introduced. Both are on by default.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hoy created this revision.Feb 3 2021, 5:00 PM

Herald added subscribers: wenlei, mgrang, hiraditya. · View Herald TranscriptFeb 3 2021, 5:00 PM

hoy requested review of this revision.Feb 3 2021, 5:00 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2021, 5:00 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

hoy added reviewers: wmi, davidxl, wenlei.Feb 3 2021, 5:08 PM

Herald added a subscriber: ormris. · View Herald TranscriptFeb 3 2021, 5:08 PM

Harbormaster completed remote builds in B87808: Diff 321265.Feb 3 2021, 7:03 PM

What is the main difference between dynamic call graph based inlining vs static call graph + priority based inlining (https://reviews.llvm.org/D94001)?

In D95988#2546697, @wmi wrote:

What is the main difference between dynamic call graph based inlining vs static call graph + priority based inlining (https://reviews.llvm.org/D94001)?

The work in this change is an enhancement to the priority-based inliner in that:

Honor profile SCC traversal order for more inlining. E.g, where there is a circle in the static call graph say A->B->C->A, the static SCC traversal order could be any order but deterministic, let's. say B->C->A. If at runtime we see a context A->B->C->A->B->C, llvm-profgen may compress it into A->B->C. Therefore by walking the SCC in the B->C->A top-down order in the sample profile loader, we will not get B inlined into A. This change adjusts the SCC processing order to reflect what is in profile so that A->B->C will be walked in order.

Honor indirect call edge order. Similar given an indirect call A->B at runtime which is missing on the static call graph, B may end up being processed before A. We'd like A to be processed before B so that B gets a chance to be inlined into A after indirect call promotion.

In D95988#2546946, @hoy wrote:

In D95988#2546697, @wmi wrote:

What is the main difference between dynamic call graph based inlining vs static call graph + priority based inlining (https://reviews.llvm.org/D94001)?

The work in this change is an enhancement to the priority-based inliner in that:

Honor profile SCC traversal order for more inlining. E.g, where there is a circle in the static call graph say A->B->C->A, the static SCC traversal order could be any order but deterministic, let's. say B->C->A. If at runtime we see a context A->B->C->A->B->C, llvm-profgen may compress it into A->B->C. Therefore by walking the SCC in the B->C->A top-down order in the sample profile loader, we will not get B inlined into A. This change adjusts the SCC processing order to reflect what is in profile so that A->B->C will be walked in order.

Honor indirect call edge order. Similar given an indirect call A->B at runtime which is missing on the static call graph, B may end up being processed before A. We'd like A to be processed before B so that B gets a chance to be inlined into A after indirect call promotion.

Thanks for the explanation. I have two questions.

Since static call graph edges have been removed, many cold callsites of small functions won't be inlined anymore. Those will all be left to CGSCC inlining. I understand that inlining those small functions may not help reduce caller size and boost more inlining because there is no cleanup after inlining in place in sample loader pass. But that is a big difference from current early inlining model. Do you see any impact from it?

The two benefits above of dynamic call graph also applies to non-CSSPGO profile. Is it possible to make it optional for non-CSSPGO profile so we can try it?

In D95988#2549605, @wmi wrote:

In D95988#2546946, @hoy wrote:

In D95988#2546697, @wmi wrote:

What is the main difference between dynamic call graph based inlining vs static call graph + priority based inlining (https://reviews.llvm.org/D94001)?

The work in this change is an enhancement to the priority-based inliner in that:

Honor profile SCC traversal order for more inlining. E.g, where there is a circle in the static call graph say A->B->C->A, the static SCC traversal order could be any order but deterministic, let's. say B->C->A. If at runtime we see a context A->B->C->A->B->C, llvm-profgen may compress it into A->B->C. Therefore by walking the SCC in the B->C->A top-down order in the sample profile loader, we will not get B inlined into A. This change adjusts the SCC processing order to reflect what is in profile so that A->B->C will be walked in order.

Honor indirect call edge order. Similar given an indirect call A->B at runtime which is missing on the static call graph, B may end up being processed before A. We'd like A to be processed before B so that B gets a chance to be inlined into A after indirect call promotion.

Thanks for the explanation. I have two questions.

Since static call graph edges have been removed, many cold callsites of small functions won't be inlined anymore. Those will all be left to CGSCC inlining. I understand that inlining those small functions may not help reduce caller size and boost more inlining because there is no cleanup after inlining in place in sample loader pass. But that is a big difference from current early inlining model. Do you see any impact from it?

The two benefits above of dynamic call graph also applies to non-CSSPGO profile. Is it possible to make it optional for non-CSSPGO profile so we can try it?

For #1, this change only adds edges to the static call graph to enforce additional order. It doesn't remove existing call edges. So it shouldn't block previous profile-based inlining. Inlining for cold callsites that exist in profile will be honored with this change. Inlining for callistes that are not recorded in the profile will mostly be done by CGSCC.

For #2, it's a good point. I believe the two benefits should also help non-CS profile based inlining. Though I haven't tried that, IIUC, the counts returned for non-inlined callees should have a better quality. That said, I'm not sure the non-CS profiled inlining can benefit as much as CSSPGO does since I haven't seen the issues that motivated this diff exists with non-CSSPGO. The main reason is that the CSSPGO inliner works on a context tri that requires explicit tri edge to present when processing a call edge. This isn't required with a non-CS profile where the concept of context tri edges are nested in the current function's profile.

In D95988#2549755, @hoy wrote:

In D95988#2549605, @wmi wrote:

In D95988#2546946, @hoy wrote:

In D95988#2546697, @wmi wrote:

What is the main difference between dynamic call graph based inlining vs static call graph + priority based inlining (https://reviews.llvm.org/D94001)?

The work in this change is an enhancement to the priority-based inliner in that:

Honor profile SCC traversal order for more inlining. E.g, where there is a circle in the static call graph say A->B->C->A, the static SCC traversal order could be any order but deterministic, let's. say B->C->A. If at runtime we see a context A->B->C->A->B->C, llvm-profgen may compress it into A->B->C. Therefore by walking the SCC in the B->C->A top-down order in the sample profile loader, we will not get B inlined into A. This change adjusts the SCC processing order to reflect what is in profile so that A->B->C will be walked in order.

Honor indirect call edge order. Similar given an indirect call A->B at runtime which is missing on the static call graph, B may end up being processed before A. We'd like A to be processed before B so that B gets a chance to be inlined into A after indirect call promotion.

Thanks for the explanation. I have two questions.

Since static call graph edges have been removed, many cold callsites of small functions won't be inlined anymore. Those will all be left to CGSCC inlining. I understand that inlining those small functions may not help reduce caller size and boost more inlining because there is no cleanup after inlining in place in sample loader pass. But that is a big difference from current early inlining model. Do you see any impact from it?

The two benefits above of dynamic call graph also applies to non-CSSPGO profile. Is it possible to make it optional for non-CSSPGO profile so we can try it?

For #1, this change only adds edges to the static call graph to enforce additional order. It doesn't remove existing call edges. So it shouldn't block previous profile-based inlining. Inlining for cold callsites that exist in profile will be honored with this change. Inlining for callistes that are not recorded in the profile will mostly be done by CGSCC.

I may misunderstand but I was refering to replaceCallGraphEdges. The command inside says:

// Remove static call edges from the call graph except for the ones from the
// root which make the call graph connected.

For #2, it's a good point. I believe the two benefits should also help non-CS profile based inlining. Though I haven't tried that, IIUC, the counts returned for non-inlined callees should have a better quality. That said, I'm not sure the non-CS profiled inlining can benefit as much as CSSPGO does since I haven't seen the issues that motivated this diff exists with non-CSSPGO. The main reason is that the CSSPGO inliner works on a context tri that requires explicit tri edge to present when processing a call edge. This isn't required with a non-CS profile where the concept of context tri edges are nested in the current function's profile.

Yes, I agree CSSPGO profile may be benefited more from this change than non-CSSPGO profile. It looks like a good enhancement for both. Do you think it is doable to make the change oblivious to the type of CS/non-CS profile?

In D95988#2549816, @wmi wrote:
In D95988#2549755, @hoy wrote:

In D95988#2549605, @wmi wrote:

In D95988#2546946, @hoy wrote:

In D95988#2546697, @wmi wrote:

What is the main difference between dynamic call graph based inlining vs static call graph + priority based inlining (https://reviews.llvm.org/D94001)?

The work in this change is an enhancement to the priority-based inliner in that:

Honor profile SCC traversal order for more inlining. E.g, where there is a circle in the static call graph say A->B->C->A, the static SCC traversal order could be any order but deterministic, let's. say B->C->A. If at runtime we see a context A->B->C->A->B->C, llvm-profgen may compress it into A->B->C. Therefore by walking the SCC in the B->C->A top-down order in the sample profile loader, we will not get B inlined into A. This change adjusts the SCC processing order to reflect what is in profile so that A->B->C will be walked in order.

Honor indirect call edge order. Similar given an indirect call A->B at runtime which is missing on the static call graph, B may end up being processed before A. We'd like A to be processed before B so that B gets a chance to be inlined into A after indirect call promotion.

Thanks for the explanation. I have two questions.

Since static call graph edges have been removed, many cold callsites of small functions won't be inlined anymore. Those will all be left to CGSCC inlining. I understand that inlining those small functions may not help reduce caller size and boost more inlining because there is no cleanup after inlining in place in sample loader pass. But that is a big difference from current early inlining model. Do you see any impact from it?

The two benefits above of dynamic call graph also applies to non-CSSPGO profile. Is it possible to make it optional for non-CSSPGO profile so we can try it?

For #1, this change only adds edges to the static call graph to enforce additional order. It doesn't remove existing call edges. So it shouldn't block previous profile-based inlining. Inlining for cold callsites that exist in profile will be honored with this change. Inlining for callistes that are not recorded in the profile will mostly be done by CGSCC.

I may misunderstand but I was refering to replaceCallGraphEdges. The command inside says:
// Remove static call edges from the call graph except for the ones from the
// root which make the call graph connected.
For #2, it's a good point. I believe the two benefits should also help non-CS profile based inlining. Though I haven't tried that, IIUC, the counts returned for non-inlined callees should have a better quality. That said, I'm not sure the non-CS profiled inlining can benefit as much as CSSPGO does since I haven't seen the issues that motivated this diff exists with non-CSSPGO. The main reason is that the CSSPGO inliner works on a context tri that requires explicit tri edge to present when processing a call edge. This isn't required with a non-CS profile where the concept of context tri edges are nested in the current function's profile.

Yes, I agree CSSPGO profile may be benefited more from this change than non-CSSPGO profile. It looks like a good enhancement for both. Do you think it is doable to make the change oblivious to the type of CS/non-CS profile?

Sure, I think it's doable to add profile call edges for non-CS profile. The profile context extracting code will look differently from the current implementation.

In D95988#2549879, @hoy wrote:
In D95988#2549816, @wmi wrote:
In D95988#2549755, @hoy wrote:

In D95988#2549605, @wmi wrote:

In D95988#2546946, @hoy wrote:

In D95988#2546697, @wmi wrote:

What is the main difference between dynamic call graph based inlining vs static call graph + priority based inlining (https://reviews.llvm.org/D94001)?

The work in this change is an enhancement to the priority-based inliner in that:

Honor profile SCC traversal order for more inlining. E.g, where there is a circle in the static call graph say A->B->C->A, the static SCC traversal order could be any order but deterministic, let's. say B->C->A. If at runtime we see a context A->B->C->A->B->C, llvm-profgen may compress it into A->B->C. Therefore by walking the SCC in the B->C->A top-down order in the sample profile loader, we will not get B inlined into A. This change adjusts the SCC processing order to reflect what is in profile so that A->B->C will be walked in order.

Honor indirect call edge order. Similar given an indirect call A->B at runtime which is missing on the static call graph, B may end up being processed before A. We'd like A to be processed before B so that B gets a chance to be inlined into A after indirect call promotion.

Thanks for the explanation. I have two questions.

Since static call graph edges have been removed, many cold callsites of small functions won't be inlined anymore. Those will all be left to CGSCC inlining. I understand that inlining those small functions may not help reduce caller size and boost more inlining because there is no cleanup after inlining in place in sample loader pass. But that is a big difference from current early inlining model. Do you see any impact from it?

The two benefits above of dynamic call graph also applies to non-CSSPGO profile. Is it possible to make it optional for non-CSSPGO profile so we can try it?

For #1, this change only adds edges to the static call graph to enforce additional order. It doesn't remove existing call edges. So it shouldn't block previous profile-based inlining. Inlining for cold callsites that exist in profile will be honored with this change. Inlining for callistes that are not recorded in the profile will mostly be done by CGSCC.

I may misunderstand but I was refering to replaceCallGraphEdges. The command inside says:
// Remove static call edges from the call graph except for the ones from the
// root which make the call graph connected.

Now I look more closely, I see that replaceCallGraphEdges is only used on a temporary callgraph which is separated from the static call graph, so you are right.

For #2, it's a good point. I believe the two benefits should also help non-CS profile based inlining. Though I haven't tried that, IIUC, the counts returned for non-inlined callees should have a better quality. That said, I'm not sure the non-CS profiled inlining can benefit as much as CSSPGO does since I haven't seen the issues that motivated this diff exists with non-CSSPGO. The main reason is that the CSSPGO inliner works on a context tri that requires explicit tri edge to present when processing a call edge. This isn't required with a non-CS profile where the concept of context tri edges are nested in the current function's profile.

Yes, I agree CSSPGO profile may be benefited more from this change than non-CSSPGO profile. It looks like a good enhancement for both. Do you think it is doable to make the change oblivious to the type of CS/non-CS profile?

Sure, I think it's doable to add profile call edges for non-CS profile. The profile context extracting code will look differently from the current implementation.

Thanks!

Extending profile-based top-down order support to non-CS profile. Only adding support for SCC. Indirect call edges are not needed since uninlined counts are not returned to indirect call targets with non-CS profiles.

Harbormaster completed remote builds in B88583: Diff 322597.Feb 9 2021, 11:22 PM

Thanks for adding the support for non-CS profile!

Extending profile-based top-down order support to non-CS profile. Only adding support for SCC. Indirect call edges are not needed since uninlined counts are not returned to indirect call targets with non-CS profiles.

Indirect call edges are still helpful for non-CS profiles. That is because top-down inlining will be helpful for better non-CS profile matching fundamentally (unless annotated profile can be updated repeatedly, but that is not the case for branch probablity for non-CS profile). Using non top-down order, function may be annotated with outline instance's profile before it can be inlined and get more precise profile with context. Because there are no indirect call edges in the static call graph, it will be helpful to add them based on dynamic call graph, to enforce the top-down order inlining more thoroughly.

llvm/lib/Transforms/IPO/SampleProfile.cpp
2363–2369	We may not need this block. Top down inlining for non-CS profile is to get more precise profile matching, not to enable more inlining. If there is not inline instance profile for a callsite, early inlining in sample loader won't inline it so it doesn't need to be added into the dynamic call graph.

In D95988#2557460, @wmi wrote:

Thanks for adding the support for non-CS profile!

Extending profile-based top-down order support to non-CS profile. Only adding support for SCC. Indirect call edges are not needed since uninlined counts are not returned to indirect call targets with non-CS profiles.

Indirect call edges are still helpful for non-CS profiles. That is because top-down inlining will be helpful for better non-CS profile matching fundamentally (unless annotated profile can be updated repeatedly, but that is not the case for branch probablity for non-CS profile). Using non top-down order, function may be annotated with outline instance's profile before it can be inlined and get more precise profile with context. Because there are no indirect call edges in the static call graph, it will be helpful to add them based on dynamic call graph, to enforce the top-down order inlining more thoroughly.

Yeah, it is fundamentally useful but I'm not sure there's way to justify the benefit right now. If you look at code below, nested callee profile is never returned to the outlined instance for unsuccessful inlining of indirect calls. I was thinking about a separate change to enable that as well as top-down order for indirect calls. What do you think?

https://github.com/llvm/llvm-project/blob/f8772da8cc9a0be65c9ba028c2b5a895c1ed4f91/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1346

llvm/lib/Transforms/IPO/SampleProfile.cpp
2363–2369	Agreed, this is not needed since it does not add any benefit to profile counts returning.

In D95988#2557503, @hoy wrote:

In D95988#2557460, @wmi wrote:

Thanks for adding the support for non-CS profile!

Extending profile-based top-down order support to non-CS profile. Only adding support for SCC. Indirect call edges are not needed since uninlined counts are not returned to indirect call targets with non-CS profiles.

Indirect call edges are still helpful for non-CS profiles. That is because top-down inlining will be helpful for better non-CS profile matching fundamentally (unless annotated profile can be updated repeatedly, but that is not the case for branch probablity for non-CS profile). Using non top-down order, function may be annotated with outline instance's profile before it can be inlined and get more precise profile with context. Because there are no indirect call edges in the static call graph, it will be helpful to add them based on dynamic call graph, to enforce the top-down order inlining more thoroughly.

Yeah, it is fundamentally useful but I'm not sure there's way to justify the benefit right now. If you look at code below, nested callee profile is never returned to the outlined instance for unsuccessful inlining of indirect calls. I was thinking about a separate change to enable that as well as top-down order for indirect calls. What do you think?

I understand profile count returning is a benefit for top-down inlining, but profile count returning are all related with cold profiles, so it may not be the major factor here? I think letting an inlined function annotated with inline instance profile with context instead of outline instance profile without context is the major reason that top-down inlining brings benefit, at least for non-CS profile. This is the original patch from Wenlei to add top-down inlining support: https://reviews.llvm.org/D70655

Sure, it is ok to address it in a separate change. Better add a TODO in the comment.

https://github.com/llvm/llvm-project/blob/f8772da8cc9a0be65c9ba028c2b5a895c1ed4f91/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1346

In D95988#2557591, @wmi wrote:

In D95988#2557503, @hoy wrote:

In D95988#2557460, @wmi wrote:

Thanks for adding the support for non-CS profile!

Extending profile-based top-down order support to non-CS profile. Only adding support for SCC. Indirect call edges are not needed since uninlined counts are not returned to indirect call targets with non-CS profiles.

Indirect call edges are still helpful for non-CS profiles. That is because top-down inlining will be helpful for better non-CS profile matching fundamentally (unless annotated profile can be updated repeatedly, but that is not the case for branch probablity for non-CS profile). Using non top-down order, function may be annotated with outline instance's profile before it can be inlined and get more precise profile with context. Because there are no indirect call edges in the static call graph, it will be helpful to add them based on dynamic call graph, to enforce the top-down order inlining more thoroughly.

Yeah, it is fundamentally useful but I'm not sure there's way to justify the benefit right now. If you look at code below, nested callee profile is never returned to the outlined instance for unsuccessful inlining of indirect calls. I was thinking about a separate change to enable that as well as top-down order for indirect calls. What do you think?

I understand profile count returning is a benefit for top-down inlining, but profile count returning are all related with cold profiles, so it may not be the major factor here? I think letting an inlined function annotated with inline instance profile with context instead of outline instance profile without context is the major reason that top-down inlining brings benefit, at least for non-CS profile. This is the original patch from Wenlei to add top-down inlining support: https://reviews.llvm.org/D70655

Sure, it is ok to address it in a separate change. Better add a TODO in the comment.

https://github.com/llvm/llvm-project/blob/f8772da8cc9a0be65c9ba028c2b5a895c1ed4f91/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1346

I see. Thanks for the explanation. Yes, the annotation quality for inlined instances are more important for outlined instances. TODO added for indirect calls.

Addressing Wei's feedbacks.

wmi added inline comments.Feb 11 2021, 11:39 AM

llvm/lib/Transforms/IPO/SampleProfile.cpp
2491	Many nodes in the static graph may not exist in ProfileOrderMap so they all get the same 0 value from the map. Better use llvm::stable_sort.

hoy added inline comments.Feb 11 2021, 11:53 AM

llvm/lib/Transforms/IPO/SampleProfile.cpp
2491	Good catch! Stable sort should make the change of existing static order minimized.

Addressing Wei's comment.

LGTM.

This revision is now accepted and ready to land.Feb 11 2021, 12:07 PM

This revision was landed with ongoing or failed builds.Feb 11 2021, 12:40 PM

Closed by commit rGde40f6d6230e: [CSSPGO] Process functions in a top-down order on a dynamic call graph. (authored by hoy). · Explain Why

This revision was automatically updated to reflect the committed changes.

hoy added a commit: rGde40f6d6230e: [CSSPGO] Process functions in a top-down order on a dynamic call graph..

Harbormaster completed remote builds in B88860: Diff 323085.Feb 11 2021, 5:14 PM

Harbormaster completed remote builds in B88872: Diff 323108.Feb 11 2021, 6:40 PM

wmi mentioned this in D99351: [CSSPGO] Top-down processing order based on full profile..Mar 26 2021, 1:54 PM

hoy mentioned this in rG3e3fc431dfe4: [CSSPGO] Top-down processing order based on full profile..Mar 30 2021, 10:43 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

IPO/

SampleContextTracker.h

13 lines

lib/

Transforms/

IPO/

SampleContextTracker.cpp

32 lines

SampleProfile.cpp

137 lines

test/

Transforms/

SampleProfile/

Inputs/

profile-context-order.prof

38 lines

profile-topdown-order.prof

36 lines

profile-context-order.ll

190 lines

profile-topdown-order.ll

179 lines

Diff 323116

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h

	Show All 12 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_TRANSFORMS_IPO_SAMPLECONTEXTTRACKER_H			#ifndef LLVM_TRANSFORMS_IPO_SAMPLECONTEXTTRACKER_H
	#define LLVM_TRANSFORMS_IPO_SAMPLECONTEXTTRACKER_H			#define LLVM_TRANSFORMS_IPO_SAMPLECONTEXTTRACKER_H

	#include "llvm/ADT/SmallSet.h"			#include "llvm/ADT/SmallSet.h"
	#include "llvm/ADT/StringMap.h"			#include "llvm/ADT/StringMap.h"
	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"
				#include "llvm/Analysis/CallGraph.h"
	#include "llvm/IR/DebugInfoMetadata.h"			#include "llvm/IR/DebugInfoMetadata.h"
	#include "llvm/IR/Instructions.h"			#include "llvm/IR/Instructions.h"
	#include "llvm/ProfileData/SampleProf.h"			#include "llvm/ProfileData/SampleProf.h"
	#include <list>			#include <list>
	#include <map>			#include <map>
	#include <vector>			#include <vector>

	using namespace llvm;			using namespace llvm;
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	// provides interfaces used by sample profile loader to query context profile or			// provides interfaces used by sample profile loader to query context profile or
	// base profile for given function or location; it also manages context tree			// base profile for given function or location; it also manages context tree
	// manipulation that is needed to accommodate inline decisions so we have			// manipulation that is needed to accommodate inline decisions so we have
	// accurate post-inline profile for functions. Internally context profiles			// accurate post-inline profile for functions. Internally context profiles
	// are organized in a trie, with each node representing profile for specific			// are organized in a trie, with each node representing profile for specific
	// calling context and the context is identified by path from root to the node.			// calling context and the context is identified by path from root to the node.
	class SampleContextTracker {			class SampleContextTracker {
	public:			public:
				using ContextSamplesTy = SmallSet<FunctionSamples *, 16>;

	SampleContextTracker(StringMap<FunctionSamples> &Profiles);			SampleContextTracker(StringMap<FunctionSamples> &Profiles);
	// Query context profile for a specific callee with given name at a given			// Query context profile for a specific callee with given name at a given
	// call-site. The full context is identified by location of call instruction.			// call-site. The full context is identified by location of call instruction.
	FunctionSamples *getCalleeContextSamplesFor(const CallBase &Inst,			FunctionSamples *getCalleeContextSamplesFor(const CallBase &Inst,
	StringRef CalleeName);			StringRef CalleeName);
	// Get samples for indirect call targets for call site at given location.			// Get samples for indirect call targets for call site at given location.
	std::vector<const FunctionSamples *>			std::vector<const FunctionSamples *>
	getIndirectCalleeContextSamplesFor(const DILocation *DIL);			getIndirectCalleeContextSamplesFor(const DILocation *DIL);
	// Query context profile for a given location. The full context			// Query context profile for a given location. The full context
	// is identified by input DILocation.			// is identified by input DILocation.
	FunctionSamples getContextSamplesFor(const DILocation DIL);			FunctionSamples getContextSamplesFor(const DILocation DIL);
	// Query context profile for a given sample contxt of a function.			// Query context profile for a given sample contxt of a function.
	FunctionSamples *getContextSamplesFor(const SampleContext &Context);			FunctionSamples *getContextSamplesFor(const SampleContext &Context);
				// Get all context profile for given function.
				ContextSamplesTy &getAllContextSamplesFor(const Function &Func);
				ContextSamplesTy &getAllContextSamplesFor(StringRef Name);
	// Query base profile for a given function. A base profile is a merged view			// Query base profile for a given function. A base profile is a merged view
	// of all context profiles for contexts that are not inlined.			// of all context profiles for contexts that are not inlined.
	FunctionSamples *getBaseSamplesFor(const Function &Func,			FunctionSamples *getBaseSamplesFor(const Function &Func,
	bool MergeContext = true);			bool MergeContext = true);
	// Query base profile for a given function by name.			// Query base profile for a given function by name.
	FunctionSamples *getBaseSamplesFor(StringRef Name, bool MergeContext);			FunctionSamples *getBaseSamplesFor(StringRef Name, bool MergeContext);
	// Mark a context profile as inlined when function is inlined.			// Mark a context profile as inlined when function is inlined.
	// This makes sure that inlined context profile will be excluded in			// This makes sure that inlined context profile will be excluded in
	// function's base profile.			// function's base profile.
	void markContextSamplesInlined(const FunctionSamples *InlinedSamples);			void markContextSamplesInlined(const FunctionSamples *InlinedSamples);
				void promoteMergeContextSamplesTree(const Instruction &Inst,
				StringRef CalleeName);
				void addCallGraphEdges(CallGraph &CG, StringMap<Function *> &SymbolMap);
	// Dump the internal context profile trie.			// Dump the internal context profile trie.
	void dump();			void dump();

	private:			private:
	ContextTrieNode getContextFor(const DILocation DIL);			ContextTrieNode getContextFor(const DILocation DIL);
	ContextTrieNode *getContextFor(const SampleContext &Context);			ContextTrieNode *getContextFor(const SampleContext &Context);
	ContextTrieNode getCalleeContextFor(const DILocation DIL,			ContextTrieNode getCalleeContextFor(const DILocation DIL,
	StringRef CalleeName);			StringRef CalleeName);
	ContextTrieNode *getOrCreateContextPath(const SampleContext &Context,			ContextTrieNode *getOrCreateContextPath(const SampleContext &Context,
	bool AllowCreate);			bool AllowCreate);
	ContextTrieNode *getTopLevelContextNode(StringRef FName);			ContextTrieNode *getTopLevelContextNode(StringRef FName);
	ContextTrieNode &addTopLevelContextNode(StringRef FName);			ContextTrieNode &addTopLevelContextNode(StringRef FName);
	ContextTrieNode &promoteMergeContextSamplesTree(ContextTrieNode &NodeToPromo);			ContextTrieNode &promoteMergeContextSamplesTree(ContextTrieNode &NodeToPromo);
	void promoteMergeContextSamplesTree(const Instruction &Inst,
	StringRef CalleeName);
	void mergeContextNode(ContextTrieNode &FromNode, ContextTrieNode &ToNode,			void mergeContextNode(ContextTrieNode &FromNode, ContextTrieNode &ToNode,
	StringRef ContextStrToRemove);			StringRef ContextStrToRemove);
	ContextTrieNode &promoteMergeContextSamplesTree(ContextTrieNode &FromNode,			ContextTrieNode &promoteMergeContextSamplesTree(ContextTrieNode &FromNode,
	ContextTrieNode &ToNodeParent,			ContextTrieNode &ToNodeParent,
	StringRef ContextStrToRemove);			StringRef ContextStrToRemove);

	// Map from function name to context profiles (excluding base profile)			// Map from function name to context profiles (excluding base profile)
	StringMap<SmallSet<FunctionSamples *, 16>> FuncToCtxtProfileSet;			StringMap<ContextSamplesTy> FuncToCtxtProfileSet;

	// Root node for context trie tree			// Root node for context trie tree
	ContextTrieNode RootContext;			ContextTrieNode RootContext;
	};			};

	} // end namespace llvm			} // end namespace llvm
	#endif // LLVM_TRANSFORMS_IPO_SAMPLECONTEXTTRACKER_H			#endif // LLVM_TRANSFORMS_IPO_SAMPLECONTEXTTRACKER_H

llvm/lib/Transforms/IPO/SampleContextTracker.cpp

Show First 20 Lines • Show All 257 Lines • ▼ Show 20 Lines
SampleContextTracker::getContextSamplesFor(const SampleContext &Context) {		SampleContextTracker::getContextSamplesFor(const SampleContext &Context) {
ContextTrieNode *Node = getContextFor(Context);		ContextTrieNode *Node = getContextFor(Context);
if (!Node)		if (!Node)
return nullptr;		return nullptr;

return Node->getFunctionSamples();		return Node->getFunctionSamples();
}		}

		SampleContextTracker::ContextSamplesTy &
		SampleContextTracker::getAllContextSamplesFor(const Function &Func) {
		StringRef CanonName = FunctionSamples::getCanonicalFnName(Func);
		return FuncToCtxtProfileSet[CanonName];
		}

		SampleContextTracker::ContextSamplesTy &
		SampleContextTracker::getAllContextSamplesFor(StringRef Name) {
		return FuncToCtxtProfileSet[Name];
		}

FunctionSamples *SampleContextTracker::getBaseSamplesFor(const Function &Func,		FunctionSamples *SampleContextTracker::getBaseSamplesFor(const Function &Func,
bool MergeContext) {		bool MergeContext) {
StringRef CanonName = FunctionSamples::getCanonicalFnName(Func);		StringRef CanonName = FunctionSamples::getCanonicalFnName(Func);
return getBaseSamplesFor(CanonName, MergeContext);		return getBaseSamplesFor(CanonName, MergeContext);
}		}

FunctionSamples *SampleContextTracker::getBaseSamplesFor(StringRef Name,		FunctionSamples *SampleContextTracker::getBaseSamplesFor(StringRef Name,
bool MergeContext) {		bool MergeContext) {
▲ Show 20 Lines • Show All 271 Lines • ▼ Show 20 Lines	ContextTrieNode &SampleContextTracker::promoteMergeContextSamplesTree(

// For root of subtree, remove itself from old parent too		// For root of subtree, remove itself from old parent too
if (MoveToRoot)		if (MoveToRoot)
FromNodeParent.removeChildContext(OldCallSiteLoc, ToNode->getFuncName());		FromNodeParent.removeChildContext(OldCallSiteLoc, ToNode->getFuncName());

return *ToNode;		return *ToNode;
}		}

		// Replace call graph edges with dynamic call edges from the profile.
		void SampleContextTracker::addCallGraphEdges(CallGraph &CG,
		StringMap<Function *> &SymbolMap) {
		// Add profile call edges to the call graph.
		std::queue<ContextTrieNode *> NodeQueue;
		NodeQueue.push(&RootContext);
		while (!NodeQueue.empty()) {
		ContextTrieNode *Node = NodeQueue.front();
		NodeQueue.pop();
		Function *F = SymbolMap.lookup(Node->getFuncName());
		for (auto &I : Node->getAllChildContext()) {
		ContextTrieNode *ChildNode = &I.second;
		NodeQueue.push(ChildNode);
		if (F && !F->isDeclaration()) {
		Function *Callee = SymbolMap.lookup(ChildNode->getFuncName());
		if (Callee && !Callee->isDeclaration())
		CG[F]->addCalledFunction(nullptr, CG[Callee]);
		}
		}
		}
		}
} // namespace llvm		} // namespace llvm

llvm/lib/Transforms/IPO/SampleProfile.cpp

Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	cl::desc("Merge past inlinee's profile to outline version if sample "
"enabled. "));		"enabled. "));

static cl::opt<bool> ProfileTopDownLoad(		static cl::opt<bool> ProfileTopDownLoad(
"sample-profile-top-down-load", cl::Hidden, cl::init(true),		"sample-profile-top-down-load", cl::Hidden, cl::init(true),
cl::desc("Do profile annotation and inlining for functions in top-down "		cl::desc("Do profile annotation and inlining for functions in top-down "
"order of call graph during sample profile loading. It only "		"order of call graph during sample profile loading. It only "
"works for new pass manager. "));		"works for new pass manager. "));

		static cl::opt<bool> UseProfileIndirectCallEdges(
		"use-profile-indirect-call-edges", cl::init(true), cl::Hidden,
		cl::desc("Considering indirect call samples from profile when top-down "
		"processing functions. Only CSSPGO is supported."));

		static cl::opt<bool> UseProfileTopDownOrder(
		"use-profile-top-down-order", cl::init(false), cl::Hidden,
		cl::desc("Process functions in one SCC in a top-down order "
		"based on the input profile."));

static cl::opt<bool> ProfileSizeInline(		static cl::opt<bool> ProfileSizeInline(
"sample-profile-inline-size", cl::Hidden, cl::init(false),		"sample-profile-inline-size", cl::Hidden, cl::init(false),
cl::desc("Inline cold call sites in profile loader if it's beneficial "		cl::desc("Inline cold call sites in profile loader if it's beneficial "
"for code size."));		"for code size."));

static cl::opt<int> ProfileInlineGrowthLimit(		static cl::opt<int> ProfileInlineGrowthLimit(
"sample-profile-inline-growth-limit", cl::Hidden, cl::init(12),		"sample-profile-inline-growth-limit", cl::Hidden, cl::init(12),
cl::desc("The size growth ratio limit for proirity-based sample profile "		cl::desc("The size growth ratio limit for proirity-based sample profile "
▲ Show 20 Lines • Show All 339 Lines • ▼ Show 20 Lines	protected:
inlineHotFunctionsWithPriority(Function &F,		inlineHotFunctionsWithPriority(Function &F,
DenseSet<GlobalValue::GUID> &InlinedGUIDs);		DenseSet<GlobalValue::GUID> &InlinedGUIDs);
// Inline cold/small functions in addition to hot ones		// Inline cold/small functions in addition to hot ones
bool shouldInlineColdCallee(CallBase &CallInst);		bool shouldInlineColdCallee(CallBase &CallInst);
void emitOptimizationRemarksForInlineCandidates(		void emitOptimizationRemarksForInlineCandidates(
const SmallVectorImpl<CallBase *> &Candidates, const Function &F,		const SmallVectorImpl<CallBase *> &Candidates, const Function &F,
bool Hot);		bool Hot);
std::vector<Function > buildFunctionOrder(Module &M, CallGraph CG);		std::vector<Function > buildFunctionOrder(Module &M, CallGraph CG);
		void addCallGraphEdges(CallGraph &CG, const FunctionSamples &Samples);
		void replaceCallGraphEdges(CallGraph &CG, StringMap<Function *> &SymbolMap);
void generateMDProfMetadata(Function &F);		void generateMDProfMetadata(Function &F);

/// Map from function name to Function *. Used to find the function from		/// Map from function name to Function *. Used to find the function from
/// the function name. If the function name contains suffix, additional		/// the function name. If the function name contains suffix, additional
/// entry is added to map from the stripped name to the function if there		/// entry is added to map from the stripped name to the function if there
/// is one-to-one mapping.		/// is one-to-one mapping.
StringMap<Function *> SymbolMap;		StringMap<Function *> SymbolMap;

▲ Show 20 Lines • Show All 1,793 Lines • ▼ Show 20 Lines	INITIALIZE_PASS_BEGIN(SampleProfileLoaderLegacyPass, "sample-profile",
"Sample Profile loader", false, false)		"Sample Profile loader", false, false)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(ProfileSummaryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(ProfileSummaryInfoWrapperPass)
INITIALIZE_PASS_END(SampleProfileLoaderLegacyPass, "sample-profile",		INITIALIZE_PASS_END(SampleProfileLoaderLegacyPass, "sample-profile",
"Sample Profile loader", false, false)		"Sample Profile loader", false, false)

		// Add inlined profile call edges to the call graph.
		void SampleProfileLoader::addCallGraphEdges(CallGraph &CG,
		const FunctionSamples &Samples) {
		Function *Caller = SymbolMap.lookup(Samples.getFuncName());
		if (!Caller \|\| Caller->isDeclaration())
		return;

		// Skip non-inlined call edges which are not important since top down inlining
		// for non-CS profile is to get more precise profile matching, not to enable
		// more inlining.

		for (const auto &CallsiteSamples : Samples.getCallsiteSamples()) {
		for (const auto &InlinedSamples : CallsiteSamples.second) {
		Function *Callee = SymbolMap.lookup(InlinedSamples.first);
		wmiUnsubmitted Not Done Reply Inline Actions We may not need this block. Top down inlining for non-CS profile is to get more precise profile matching, not to enable more inlining. If there is not inline instance profile for a callsite, early inlining in sample loader won't inline it so it doesn't need to be added into the dynamic call graph. wmi: We may not need this block. Top down inlining for non-CS profile is to get more precise profile…
		hoyAuthorUnsubmitted Done Reply Inline Actions Agreed, this is not needed since it does not add any benefit to profile counts returning. hoy: Agreed, this is not needed since it does not add any benefit to profile counts returning.
		if (Callee && !Callee->isDeclaration())
		CG[Caller]->addCalledFunction(nullptr, CG[Callee]);
		addCallGraphEdges(CG, InlinedSamples.second);
		}
		}
		}

		// Replace call graph edges with dynamic call edges from the profile.
		void SampleProfileLoader::replaceCallGraphEdges(
		CallGraph &CG, StringMap<Function *> &SymbolMap) {
		// Remove static call edges from the call graph except for the ones from the
		// root which make the call graph connected.
		for (const auto &Node : CG)
		if (Node.second.get() != CG.getExternalCallingNode())
		Node.second->removeAllCalledFunctions();

		// Add profile call edges to the call graph.
		if (ProfileIsCS) {
		ContextTracker->addCallGraphEdges(CG, SymbolMap);
		} else {
		for (const auto &Samples : Reader->getProfiles())
		addCallGraphEdges(CG, Samples.second);
		}
		}

std::vector<Function *>		std::vector<Function *>
SampleProfileLoader::buildFunctionOrder(Module &M, CallGraph *CG) {		SampleProfileLoader::buildFunctionOrder(Module &M, CallGraph *CG) {
std::vector<Function *> FunctionOrderList;		std::vector<Function *> FunctionOrderList;
FunctionOrderList.reserve(M.size());		FunctionOrderList.reserve(M.size());

if (!ProfileTopDownLoad \|\| CG == nullptr) {		if (!ProfileTopDownLoad \|\| CG == nullptr) {
if (ProfileMergeInlinee) {		if (ProfileMergeInlinee) {
// Disable ProfileMergeInlinee if profile is not loaded in top down order,		// Disable ProfileMergeInlinee if profile is not loaded in top down order,
// because the profile for a function may be used for the profile		// because the profile for a function may be used for the profile
// annotation of its outline copy before the profile merging of its		// annotation of its outline copy before the profile merging of its
// non-inlined inline instances, and that is not the way how		// non-inlined inline instances, and that is not the way how
// ProfileMergeInlinee is supposed to work.		// ProfileMergeInlinee is supposed to work.
ProfileMergeInlinee = false;		ProfileMergeInlinee = false;
}		}

for (Function &F : M)		for (Function &F : M)
if (!F.isDeclaration() && F.hasFnAttribute("use-sample-profile"))		if (!F.isDeclaration() && F.hasFnAttribute("use-sample-profile"))
FunctionOrderList.push_back(&F);		FunctionOrderList.push_back(&F);
return FunctionOrderList;		return FunctionOrderList;
}		}

assert(&CG->getModule() == &M);		assert(&CG->getModule() == &M);

		// Add indirect call edges from profile to augment the static call graph.
		// Functions will be processed in a top-down order defined by the static call
		// graph. Adjusting the order by considering indirect call edges from the
		// profile (which don't exist in the static call graph) can enable the
		// inlining of indirect call targets by processing the caller before them.
		// TODO: enable this for non-CS profile and fix the counts returning logic to
		// have a full support for indirect calls.
		if (UseProfileIndirectCallEdges && ProfileIsCS) {
		for (auto &Entry : *CG) {
		const auto *F = Entry.first;
		if (!F \|\| F->isDeclaration() \|\| !F->hasFnAttribute("use-sample-profile"))
		continue;
		auto &AllContexts = ContextTracker->getAllContextSamplesFor(F->getName());
		if (AllContexts.empty())
		continue;

		for (const auto &BB : *F) {
		for (const auto &I : BB.getInstList()) {
		const auto *CB = dyn_cast<CallBase>(&I);
		if (!CB \|\| !CB->isIndirectCall())
		continue;
		const DebugLoc &DLoc = I.getDebugLoc();
		if (!DLoc)
		continue;
		auto CallSite = FunctionSamples::getCallSiteIdentifier(DLoc);
		for (FunctionSamples *Samples : AllContexts) {
		if (auto CallTargets = Samples->findCallTargetMapAt(CallSite)) {
		for (const auto &Target : CallTargets.get()) {
		Function *Callee = SymbolMap.lookup(Target.first());
		if (Callee && !Callee->isDeclaration())
		Entry.second->addCalledFunction(nullptr, (*CG)[Callee]);
		}
		}
		}
		}
		}
		}
		}

		// Compute a top-down order the profile which is used to sort functions in
		// one SCC later. The static processing order computed for an SCC may not
		// reflect the call contexts in the context-sensitive profile, thus may cause
		// potential inlining to be overlooked. The function order in one SCC is being
		// adjusted to a top-down order based on the profile to favor more inlining.
		DenseMap<Function *, uint64_t> ProfileOrderMap;
		if (UseProfileTopDownOrder \|\|
		(ProfileIsCS && !UseProfileTopDownOrder.getNumOccurrences())) {
		// Create a static call graph. The call edges are not important since they
		// will be replaced by dynamic edges from the profile.
		CallGraph ProfileCG(M);
		replaceCallGraphEdges(ProfileCG, SymbolMap);
		scc_iterator<CallGraph *> CGI = scc_begin(&ProfileCG);
		uint64_t I = 0;
		while (!CGI.isAtEnd()) {
		for (CallGraphNode Node : CGI) {
		if (auto *F = Node->getFunction())
		ProfileOrderMap[F] = ++I;
		}
		++CGI;
		}
		}

scc_iterator<CallGraph *> CGI = scc_begin(CG);		scc_iterator<CallGraph *> CGI = scc_begin(CG);
while (!CGI.isAtEnd()) {		while (!CGI.isAtEnd()) {
for (CallGraphNode node : CGI) {		uint64_t Start = FunctionOrderList.size();
auto F = node->getFunction();		for (CallGraphNode Node : CGI) {
		auto *F = Node->getFunction();
if (F && !F->isDeclaration() && F->hasFnAttribute("use-sample-profile"))		if (F && !F->isDeclaration() && F->hasFnAttribute("use-sample-profile"))
FunctionOrderList.push_back(F);		FunctionOrderList.push_back(F);
}		}

		// Sort nodes in SCC based on the profile top-down order.
		if (!ProfileOrderMap.empty()) {
		std::stable_sort(FunctionOrderList.begin() + Start,
		wmiUnsubmitted Not Done Reply Inline Actions Many nodes in the static graph may not exist in ProfileOrderMap so they all get the same 0 value from the map. Better use llvm::stable_sort. wmi: Many nodes in the static graph may not exist in ProfileOrderMap so they all get the same 0…
		hoyAuthorUnsubmitted Done Reply Inline Actions Good catch! Stable sort should make the change of existing static order minimized. hoy: Good catch! Stable sort should make the change of existing static order minimized.
		FunctionOrderList.end(),
		[&ProfileOrderMap](Function Left, Function Right) {
		return ProfileOrderMap[Left] < ProfileOrderMap[Right];
		});
		}

++CGI;		++CGI;
}		}

		LLVM_DEBUG({
		dbgs() << "Function processing order:\n";
		for (auto F : reverse(FunctionOrderList)) {
		dbgs() << F->getName() << "\n";
		}
		});

std::reverse(FunctionOrderList.begin(), FunctionOrderList.end());		std::reverse(FunctionOrderList.begin(), FunctionOrderList.end());
return FunctionOrderList;		return FunctionOrderList;
}		}

bool SampleProfileLoader::doInitialization(Module &M,		bool SampleProfileLoader::doInitialization(Module &M,
FunctionAnalysisManager *FAM) {		FunctionAnalysisManager *FAM) {
auto &Ctx = M.getContext();		auto &Ctx = M.getContext();

▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	bool SampleProfileLoaderLegacyPass::runOnModule(Module &M) {
TTIWP = &getAnalysis<TargetTransformInfoWrapperPass>();		TTIWP = &getAnalysis<TargetTransformInfoWrapperPass>();
TLIWP = &getAnalysis<TargetLibraryInfoWrapperPass>();		TLIWP = &getAnalysis<TargetLibraryInfoWrapperPass>();
ProfileSummaryInfo *PSI =		ProfileSummaryInfo *PSI =
&getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();		&getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
return SampleLoader.runOnModule(M, nullptr, PSI, nullptr);		return SampleLoader.runOnModule(M, nullptr, PSI, nullptr);
}		}

bool SampleProfileLoader::runOnFunction(Function &F, ModuleAnalysisManager *AM) {		bool SampleProfileLoader::runOnFunction(Function &F, ModuleAnalysisManager *AM) {
		LLVM_DEBUG(dbgs() << "\n\nProcessing Function " << F.getName() << "\n");
DILocation2SampleMap.clear();		DILocation2SampleMap.clear();
// By default the entry count is initialized to -1, which will be treated		// By default the entry count is initialized to -1, which will be treated
// conservatively by getEntryCount as the same as unknown (None). This is		// conservatively by getEntryCount as the same as unknown (None). This is
// to avoid newly added code to be treated as cold. If we have samples		// to avoid newly added code to be treated as cold. If we have samples
// this will be overwritten in emitAnnotations.		// this will be overwritten in emitAnnotations.
uint64_t initialEntryCount = -1;		uint64_t initialEntryCount = -1;

ProfAccForSymsInList = ProfileAccurateForSymsInList && PSL;		ProfAccForSymsInList = ProfileAccurateForSymsInList && PSL;
▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

llvm/test/Transforms/SampleProfile/Inputs/profile-context-order.prof

This file was added.

				[main:3 @ _Z5funcAi:1 @ _Z8funcLeafi]:1467299:11
				0: 6
				1: 6
				3: 287884
				15: 23
				[main:3.1 @ _Z5funcBi:1 @ _Z8funcLeafi]:500853:20
				0: 15
				1: 15
				3: 74946
				10: 23324
				15: 11
				[main]:154:0
				2: 12
				3: 18 _Z5funcAi:11
				3.1: 18 _Z5funcBi:19
				[external:12 @ main]:154:12
				2: 12
				3: 10 _Z5funcAi:7
				3.1: 10 _Z5funcBi:11
				[main:3.1 @ _Z5funcBi]:120:19
				0: 19
				1: 19 _Z8funcLeafi:20
				3: 12
				[externalA:17 @ _Z5funcBi]:120:3
				0: 3
				1: 3
				[external:10 @ _Z5funcBi]:120:10
				0: 10
				1: 10
				[main:3 @ _Z5funcAi]:99:11
				0: 10
				1: 10 _Z8funcLeafi:11
				2: 287864 _Z3fibi:315608
				3: 24
				[main:3 @ _Z5funcAi:2 @ _Z3fibi]:287864:315608
				0: 362839
				1: 6
				3: 287884
				No newline at end of file

llvm/test/Transforms/SampleProfile/Inputs/profile-topdown-order.prof

This file was added.

				_Z8funcLeafi:500853:20
				0: 15
				1: 15
				3: 74946
				10: 23324
				15: 11
				main:154:0
				2: 12
				3: 18 _Z5funcAi:11
				3.1: 18 _Z5funcBi:19
				main:154:12
				2: 12
				3: 10 _Z5funcAi:7
				3.1: 10 _Z5funcBi:11
				_Z5funcBi:120:19
				0: 19
				1: 19 _Z8funcLeafi:20
				3: 12
				_Z5funcBi:120:3
				0: 3
				1: 3
				_Z5funcBi:120:10
				0: 10
				1: 10
				_Z5funcAi:99:11
				0: 10
				1: _Z8funcLeafi:40
				0: 6
				1: 6
				3: 2
				15: 23
				2: 315608 _Z3fibi:362839
				0: 315608
				1: 6
				3: 287884
				3: 24

llvm/test/Transforms/SampleProfile/profile-context-order.ll

This file was added.

				;; Test for different function processing orders affecting inlining in sample profile loader.

				;; There is an SCC _Z5funcAi -> _Z8funcLeafi -> _Z5funcAi in the program.
				;; With -use-profile-top-down-order=0, the top-down processing order of
				;; that SCC is (_Z8funcLeafi, _Z5funcAi), which is determinined based on
				;; the static call graph. With -use-profile-top-down-order=1, call edges
				;; from profile are considered, thus the order becomes (_Z5funcAi, _Z8funcLeafi)
				;; which leads to _Z8funcLeafi inlined into _Z5funcAi.
				; RUN: opt < %s -passes=sample-profile -use-profile-top-down-order=1 -sample-profile-file=%S/Inputs/profile-context-order.prof -S \| FileCheck %s -check-prefix=INLINE
				; RUN: opt < %s -passes=sample-profile -use-profile-top-down-order=0 -sample-profile-file=%S/Inputs/profile-context-order.prof -S \| FileCheck %s -check-prefix=NOINLINE

				;; There is an indirect call _Z5funcAi -> _Z3fibi in the program.
				;; With -use-profile-indirect-call-edges=0, the processing order computed
				;; based on the static call graph is (_Z3fibi, _Z5funcAi). With
				;; -use-profile-top-down-order=1, the indirect call edge from profile is
				;; considered, thus the order becomes (_Z5funcAi, _Z3fibi) which leads to
				;; _Z3fibi inlined into _Z5funcAi.
				; RUN: opt < %s -passes=sample-profile -use-profile-indirect-call-edges=1 -sample-profile-file=%S/Inputs/profile-context-order.prof -S \| FileCheck %s -check-prefix=ICALL-INLINE
				; RUN: opt < %s -passes=sample-profile -use-profile-indirect-call-edges=0 -sample-profile-file=%S/Inputs/profile-context-order.prof -S \| FileCheck %s -check-prefix=ICALL-NOINLINE

				@factor = dso_local global i32 3, align 4, !dbg !0
				@fp = dso_local global i32 (i32)* null, align 8

				define dso_local i32 @main() local_unnamed_addr #0 !dbg !18 {
				entry:
				store i32 (i32)* @_Z3fibi, i32 (i32)** @fp, align 8, !dbg !25
				br label %for.body, !dbg !25

				for.cond.cleanup: ; preds = %for.body
				ret i32 %add3, !dbg !27

				for.body: ; preds = %for.body, %entry
				%x.011 = phi i32 [ 300000, %entry ], [ %dec, %for.body ]
				%r.010 = phi i32 [ 0, %entry ], [ %add3, %for.body ]
				%call = tail call i32 @_Z5funcBi(i32 %x.011), !dbg !32
				%add = add nuw nsw i32 %x.011, 1, !dbg !31
				%call1 = tail call i32 @_Z5funcAi(i32 %add), !dbg !28
				%add2 = add i32 %call, %r.010, !dbg !34
				%add3 = add i32 %add2, %call1, !dbg !35
				%dec = add nsw i32 %x.011, -1, !dbg !36
				%cmp = icmp eq i32 %x.011, 0, !dbg !38
				br i1 %cmp, label %for.cond.cleanup, label %for.body, !dbg !25
				}

				; INLINE: define dso_local i32 @_Z5funcAi
				; INLINE-NOT: call i32 @_Z8funcLeafi
				; NOINLINE: define dso_local i32 @_Z5funcAi
				; NOINLINE: call i32 @_Z8funcLeafi
				; ICALL-INLINE: define dso_local i32 @_Z5funcAi
				; ICALL-INLINE: call i32 @_Z3foo
				; ICALL-NOINLINE: define dso_local i32 @_Z5funcAi
				; ICALL-NOINLINE-NO: call i32 @_Z3foo
				; ICALL-NOINLINE-NO: call i32 @_Z3fibi
				define dso_local i32 @_Z5funcAi(i32 %x) local_unnamed_addr #0 !dbg !40 {
				entry:
				%add = add nsw i32 %x, 100000, !dbg !44
				%0 = load i32 (i32), i32 (i32)* @fp, align 8
				%call = call i32 %0(i32 8), !dbg !45
				%call1 = tail call i32 @_Z8funcLeafi(i32 %add), !dbg !46
				ret i32 %call, !dbg !46
				}

				; INLINE: define dso_local i32 @_Z8funcLeafi
				; NOINLINE: define dso_local i32 @_Z8funcLeafi
				; ICALL-INLINE: define dso_local i32 @_Z8funcLeafi
				; ICALL-NOINLINE: define dso_local i32 @_Z8funcLeafi
				define dso_local i32 @_Z8funcLeafi(i32 %x) local_unnamed_addr #1 !dbg !54 {
				entry:
				%cmp = icmp sgt i32 %x, 0, !dbg !57
				br i1 %cmp, label %while.body, label %while.cond2.preheader, !dbg !59

				while.cond2.preheader: ; preds = %entry
				%cmp313 = icmp slt i32 %x, 0, !dbg !60
				br i1 %cmp313, label %while.body4, label %if.end, !dbg !63

				while.body: ; preds = %while.body, %entry
				%x.addr.016 = phi i32 [ %sub, %while.body ], [ %x, %entry ]
				%tmp = load volatile i32, i32* @factor, align 4, !dbg !64
				%call = tail call i32 @_Z5funcAi(i32 %tmp), !dbg !67
				%sub = sub nsw i32 %x.addr.016, %call, !dbg !68
				%cmp1 = icmp sgt i32 %sub, 0, !dbg !69
				br i1 %cmp1, label %while.body, label %if.end, !dbg !71

				while.body4: ; preds = %while.body4, %while.cond2.preheader
				%x.addr.114 = phi i32 [ %add, %while.body4 ], [ %x, %while.cond2.preheader ]
				%tmp1 = load volatile i32, i32* @factor, align 4, !dbg !72
				%call5 = tail call i32 @_Z5funcBi(i32 %tmp1), !dbg !74
				%add = add nsw i32 %call5, %x.addr.114, !dbg !75
				%cmp3 = icmp slt i32 %add, 0, !dbg !60
				br i1 %cmp3, label %while.body4, label %if.end, !dbg !63

				if.end: ; preds = %while.body4, %while.body, %while.cond2.preheader
				%x.addr.2 = phi i32 [ 0, %while.cond2.preheader ], [ %sub, %while.body ], [ %add, %while.body4 ]
				ret i32 %x.addr.2, !dbg !76
				}

				define dso_local i32 @_Z5funcBi(i32 %x) local_unnamed_addr #0 !dbg !47 {
				entry:
				%sub = add nsw i32 %x, -100000, !dbg !51
				%call = tail call i32 @_Z8funcLeafi(i32 %sub), !dbg !52
				ret i32 %call, !dbg !53
				}

				define dso_local i32 @_Z3fibi(i32 %x) local_unnamed_addr #1 !dbg !77 {
				entry:
				%sub = add nsw i32 %x, -100000, !dbg !78
				%call = tail call i32 @_Z3foo(i32 %sub), !dbg !78
				ret i32 %sub, !dbg !78
				}

				declare i32 @_Z3foo(i32)

				attributes #0 = { nofree noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "use-sample-profile" }
				attributes #1 = { nofree nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "use-sample-profile" }

				!llvm.dbg.cu = !{!2}
				!llvm.module.flags = !{!14, !15, !16}
				!llvm.ident = !{!17}

				!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
				!1 = distinct !DIGlobalVariable(name: "factor", scope: !2, file: !3, line: 21, type: !13, isLocal: false, isDefinition: true)
				!2 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !3, producer: "clang version 11.0.0", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !4, retainedTypes: !5, globals: !12, splitDebugInlining: false, debugInfoForProfiling: true, nameTableKind: None)
				!3 = !DIFile(filename: "merged.cpp", directory: "/local/autofdo")
				!4 = !{}
				!5 = !{!6, !10, !11}
				!6 = !DISubprogram(name: "funcA", linkageName: "_Z5funcAi", scope: !3, file: !3, line: 6, type: !7, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !4)
				!7 = !DISubroutineType(types: !8)
				!8 = !{!9, !9}
				!9 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
				!10 = !DISubprogram(name: "funcB", linkageName: "_Z5funcBi", scope: !3, file: !3, line: 7, type: !7, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !4)
				!11 = !DISubprogram(name: "funcLeaf", linkageName: "_Z8funcLeafi", scope: !3, file: !3, line: 22, type: !7, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !4)
				!12 = !{!0}
				!13 = !DIDerivedType(tag: DW_TAG_volatile_type, baseType: !9)
				!14 = !{i32 7, !"Dwarf Version", i32 4}
				!15 = !{i32 2, !"Debug Info Version", i32 3}
				!16 = !{i32 1, !"wchar_size", i32 4}
				!17 = !{!"clang version 11.0.0"}
				!18 = distinct !DISubprogram(name: "main", scope: !3, file: !3, line: 11, type: !19, scopeLine: 11, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2, retainedNodes: !21)
				!19 = !DISubroutineType(types: !20)
				!20 = !{!9}
				!21 = !{!22, !23}
				!22 = !DILocalVariable(name: "r", scope: !18, file: !3, line: 12, type: !9)
				!23 = !DILocalVariable(name: "x", scope: !24, file: !3, line: 13, type: !9)
				!24 = distinct !DILexicalBlock(scope: !18, file: !3, line: 13, column: 3)
				!25 = !DILocation(line: 13, column: 3, scope: !26)
				!26 = !DILexicalBlockFile(scope: !24, file: !3, discriminator: 2)
				!27 = !DILocation(line: 17, column: 3, scope: !18)
				!28 = !DILocation(line: 14, column: 10, scope: !29)
				!29 = distinct !DILexicalBlock(scope: !30, file: !3, line: 13, column: 37)
				!30 = distinct !DILexicalBlock(scope: !24, file: !3, line: 13, column: 3)
				!31 = !DILocation(line: 14, column: 29, scope: !29)
				!32 = !DILocation(line: 14, column: 21, scope: !33)
				!33 = !DILexicalBlockFile(scope: !29, file: !3, discriminator: 2)
				!34 = !DILocation(line: 14, column: 19, scope: !29)
				!35 = !DILocation(line: 14, column: 7, scope: !29)
				!36 = !DILocation(line: 13, column: 33, scope: !37)
				!37 = !DILexicalBlockFile(scope: !30, file: !3, discriminator: 6)
				!38 = !DILocation(line: 13, column: 26, scope: !39)
				!39 = !DILexicalBlockFile(scope: !30, file: !3, discriminator: 2)
				!40 = distinct !DISubprogram(name: "funcA", linkageName: "_Z5funcAi", scope: !3, file: !3, line: 26, type: !7, scopeLine: 26, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2)
				!44 = !DILocation(line: 26, column: 22, scope: !40)
				!45 = !DILocation(line: 28, column: 11, scope: !40)
				!46 = !DILocation(line: 27, column: 3, scope: !40)
				!47 = distinct !DISubprogram(name: "funcB", linkageName: "_Z5funcBi", scope: !3, file: !3, line: 32, type: !7, scopeLine: 32, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2)
				!51 = !DILocation(line: 33, column: 22, scope: !47)
				!52 = !DILocation(line: 33, column: 11, scope: !47)
				!53 = !DILocation(line: 35, column: 3, scope: !47)
				!54 = distinct !DISubprogram(name: "funcLeaf", linkageName: "_Z8funcLeafi", scope: !3, file: !3, line: 48, type: !7, scopeLine: 48, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2)
				!57 = !DILocation(line: 49, column: 9, scope: !58)
				!58 = distinct !DILexicalBlock(scope: !54, file: !3, line: 49, column: 7)
				!59 = !DILocation(line: 49, column: 7, scope: !54)
				!60 = !DILocation(line: 58, column: 14, scope: !61)
				!61 = !DILexicalBlockFile(scope: !62, file: !3, discriminator: 2)
				!62 = distinct !DILexicalBlock(scope: !58, file: !3, line: 56, column: 8)
				!63 = !DILocation(line: 58, column: 5, scope: !61)
				!64 = !DILocation(line: 52, column: 16, scope: !65)
				!65 = distinct !DILexicalBlock(scope: !66, file: !3, line: 51, column: 19)
				!66 = distinct !DILexicalBlock(scope: !58, file: !3, line: 49, column: 14)
				!67 = !DILocation(line: 52, column: 12, scope: !65)
				!68 = !DILocation(line: 52, column: 9, scope: !65)
				!69 = !DILocation(line: 51, column: 14, scope: !70)
				!70 = !DILexicalBlockFile(scope: !66, file: !3, discriminator: 2)
				!71 = !DILocation(line: 51, column: 5, scope: !70)
				!72 = !DILocation(line: 59, column: 16, scope: !73)
				!73 = distinct !DILexicalBlock(scope: !62, file: !3, line: 58, column: 19)
				!74 = !DILocation(line: 59, column: 12, scope: !73)
				!75 = !DILocation(line: 59, column: 9, scope: !73)
				!76 = !DILocation(line: 63, column: 3, scope: !54)
				!77 = distinct !DISubprogram(name: "funcB", linkageName: "_Z3fibi", scope: !3, file: !3, line: 32, type: !7, scopeLine: 32, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2)
				!78 = !DILocation(line: 33, column: 22, scope: !77)

llvm/test/Transforms/SampleProfile/profile-topdown-order.ll

This file was added.

				;; Test for different function processing orders affecting inlining in sample profile loader.

				;; There is an SCC _Z5funcAi -> _Z8funcLeafi -> _Z5funcAi in the program.
				;; With -use-profile-top-down-order=0, the top-down processing order of
				;; that SCC is (_Z8funcLeafi, _Z5funcAi), which is determinined based on
				;; the static call graph. With -use-profile-top-down-order=1, call edges
				;; from profile are considered, thus the order becomes (_Z5funcAi, _Z8funcLeafi).
				;; While _Z8funcLeafi is not supposed to be inlined, the outlined entry counts
				;; are affected.
				; RUN: opt < %s -passes=sample-profile -use-profile-top-down-order=0 -sample-profile-file=%S/Inputs/profile-topdown-order.prof -S \| FileCheck %s -check-prefix=STATIC
				; RUN: opt < %s -passes=sample-profile -use-profile-top-down-order=1 -sample-profile-file=%S/Inputs/profile-topdown-order.prof -S \| FileCheck %s -check-prefix=DYNAMIC


				; STATIC: define dso_local i32 @_Z8funcLeafi{{.*}} !prof ![[#PROF:]]
				; STATIC: ![[#PROF]] = !{!"function_entry_count", i64 21}
				; DYNAMIC: define dso_local i32 @_Z8funcLeafi{{.*}} !prof ![[#PROF:]]
				; DYNAMIC: ![[#PROF]] = !{!"function_entry_count", i64 27}

				@factor = dso_local global i32 3, align 4, !dbg !0
				@fp = dso_local global i32 (i32)* null, align 8

				define dso_local i32 @main() local_unnamed_addr #0 !dbg !18 {
				entry:
				store i32 (i32)* @_Z3fibi, i32 (i32)** @fp, align 8, !dbg !25
				br label %for.body, !dbg !25

				for.cond.cleanup: ; preds = %for.body
				ret i32 %add3, !dbg !27

				for.body: ; preds = %for.body, %entry
				%x.011 = phi i32 [ 300000, %entry ], [ %dec, %for.body ]
				%r.010 = phi i32 [ 0, %entry ], [ %add3, %for.body ]
				%call = tail call i32 @_Z5funcBi(i32 %x.011), !dbg !32
				%add = add nuw nsw i32 %x.011, 1, !dbg !31
				%call1 = tail call i32 @_Z5funcAi(i32 %add), !dbg !28
				%add2 = add i32 %call, %r.010, !dbg !34
				%add3 = add i32 %add2, %call1, !dbg !35
				%dec = add nsw i32 %x.011, -1, !dbg !36
				%cmp = icmp eq i32 %x.011, 0, !dbg !38
				br i1 %cmp, label %for.cond.cleanup, label %for.body, !dbg !25
				}

				define dso_local i32 @_Z5funcAi(i32 %x) local_unnamed_addr #0 !dbg !40 {
				entry:
				%add = add nsw i32 %x, 100000, !dbg !44
				%0 = load i32 (i32), i32 (i32)* @fp, align 8
				%call = call i32 %0(i32 8), !dbg !45
				%call1 = tail call i32 @_Z8funcLeafi(i32 %add), !dbg !46
				ret i32 %call, !dbg !46
				}

				; INLINE: define dso_local i32 @_Z8funcLeafi
				; NOINLINE: define dso_local i32 @_Z8funcLeafi
				; ICALL-INLINE: define dso_local i32 @_Z8funcLeafi
				; ICALL-NOINLINE: define dso_local i32 @_Z8funcLeafi
				define dso_local i32 @_Z8funcLeafi(i32 %x) local_unnamed_addr #1 !dbg !54 {
				entry:
				%cmp = icmp sgt i32 %x, 0, !dbg !57
				br i1 %cmp, label %while.body, label %while.cond2.preheader, !dbg !59

				while.cond2.preheader: ; preds = %entry
				%cmp313 = icmp slt i32 %x, 0, !dbg !60
				br i1 %cmp313, label %while.body4, label %if.end, !dbg !63

				while.body: ; preds = %while.body, %entry
				%x.addr.016 = phi i32 [ %sub, %while.body ], [ %x, %entry ]
				%tmp = load volatile i32, i32* @factor, align 4, !dbg !64
				%call = tail call i32 @_Z5funcAi(i32 %tmp), !dbg !67
				%sub = sub nsw i32 %x.addr.016, %call, !dbg !68
				%cmp1 = icmp sgt i32 %sub, 0, !dbg !69
				br i1 %cmp1, label %while.body, label %if.end, !dbg !71

				while.body4: ; preds = %while.body4, %while.cond2.preheader
				%x.addr.114 = phi i32 [ %add, %while.body4 ], [ %x, %while.cond2.preheader ]
				%tmp1 = load volatile i32, i32* @factor, align 4, !dbg !72
				%call5 = tail call i32 @_Z5funcBi(i32 %tmp1), !dbg !74
				%add = add nsw i32 %call5, %x.addr.114, !dbg !75
				%cmp3 = icmp slt i32 %add, 0, !dbg !60
				br i1 %cmp3, label %while.body4, label %if.end, !dbg !63

				if.end: ; preds = %while.body4, %while.body, %while.cond2.preheader
				%x.addr.2 = phi i32 [ 0, %while.cond2.preheader ], [ %sub, %while.body ], [ %add, %while.body4 ]
				ret i32 %x.addr.2, !dbg !76
				}

				define dso_local i32 @_Z5funcBi(i32 %x) local_unnamed_addr #0 !dbg !47 {
				entry:
				%sub = add nsw i32 %x, -100000, !dbg !51
				%call = tail call i32 @_Z8funcLeafi(i32 %sub), !dbg !52
				ret i32 %call, !dbg !53
				}

				define dso_local i32 @_Z3fibi(i32 %x) local_unnamed_addr #1 !dbg !77 {
				entry:
				%sub = add nsw i32 %x, -100000, !dbg !78
				%call = tail call i32 @_Z3foo(i32 %sub), !dbg !78
				ret i32 %sub, !dbg !78
				}

				declare i32 @_Z3foo(i32)

				attributes #0 = { nofree noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "use-sample-profile" }
				attributes #1 = { nofree nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "use-sample-profile" }

				!llvm.dbg.cu = !{!2}
				!llvm.module.flags = !{!14, !15, !16}
				!llvm.ident = !{!17}

				!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
				!1 = distinct !DIGlobalVariable(name: "factor", scope: !2, file: !3, line: 21, type: !13, isLocal: false, isDefinition: true)
				!2 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !3, producer: "clang version 11.0.0", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !4, retainedTypes: !5, globals: !12, splitDebugInlining: false, debugInfoForProfiling: true, nameTableKind: None)
				!3 = !DIFile(filename: "merged.cpp", directory: "/local/autofdo")
				!4 = !{}
				!5 = !{!6, !10, !11}
				!6 = !DISubprogram(name: "funcA", linkageName: "_Z5funcAi", scope: !3, file: !3, line: 6, type: !7, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !4)
				!7 = !DISubroutineType(types: !8)
				!8 = !{!9, !9}
				!9 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
				!10 = !DISubprogram(name: "funcB", linkageName: "_Z5funcBi", scope: !3, file: !3, line: 7, type: !7, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !4)
				!11 = !DISubprogram(name: "funcLeaf", linkageName: "_Z8funcLeafi", scope: !3, file: !3, line: 22, type: !7, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !4)
				!12 = !{!0}
				!13 = !DIDerivedType(tag: DW_TAG_volatile_type, baseType: !9)
				!14 = !{i32 7, !"Dwarf Version", i32 4}
				!15 = !{i32 2, !"Debug Info Version", i32 3}
				!16 = !{i32 1, !"wchar_size", i32 4}
				!17 = !{!"clang version 11.0.0"}
				!18 = distinct !DISubprogram(name: "main", scope: !3, file: !3, line: 11, type: !19, scopeLine: 11, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2, retainedNodes: !21)
				!19 = !DISubroutineType(types: !20)
				!20 = !{!9}
				!21 = !{!22, !23}
				!22 = !DILocalVariable(name: "r", scope: !18, file: !3, line: 12, type: !9)
				!23 = !DILocalVariable(name: "x", scope: !24, file: !3, line: 13, type: !9)
				!24 = distinct !DILexicalBlock(scope: !18, file: !3, line: 13, column: 3)
				!25 = !DILocation(line: 13, column: 3, scope: !26)
				!26 = !DILexicalBlockFile(scope: !24, file: !3, discriminator: 2)
				!27 = !DILocation(line: 17, column: 3, scope: !18)
				!28 = !DILocation(line: 14, column: 10, scope: !29)
				!29 = distinct !DILexicalBlock(scope: !30, file: !3, line: 13, column: 37)
				!30 = distinct !DILexicalBlock(scope: !24, file: !3, line: 13, column: 3)
				!31 = !DILocation(line: 14, column: 29, scope: !29)
				!32 = !DILocation(line: 14, column: 21, scope: !33)
				!33 = !DILexicalBlockFile(scope: !29, file: !3, discriminator: 2)
				!34 = !DILocation(line: 14, column: 19, scope: !29)
				!35 = !DILocation(line: 14, column: 7, scope: !29)
				!36 = !DILocation(line: 13, column: 33, scope: !37)
				!37 = !DILexicalBlockFile(scope: !30, file: !3, discriminator: 6)
				!38 = !DILocation(line: 13, column: 26, scope: !39)
				!39 = !DILexicalBlockFile(scope: !30, file: !3, discriminator: 2)
				!40 = distinct !DISubprogram(name: "funcA", linkageName: "_Z5funcAi", scope: !3, file: !3, line: 26, type: !7, scopeLine: 26, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2)
				!44 = !DILocation(line: 26, column: 22, scope: !40)
				!45 = !DILocation(line: 28, column: 11, scope: !40)
				!46 = !DILocation(line: 27, column: 3, scope: !40)
				!47 = distinct !DISubprogram(name: "funcB", linkageName: "_Z5funcBi", scope: !3, file: !3, line: 32, type: !7, scopeLine: 32, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2)
				!51 = !DILocation(line: 33, column: 22, scope: !47)
				!52 = !DILocation(line: 33, column: 11, scope: !47)
				!53 = !DILocation(line: 35, column: 3, scope: !47)
				!54 = distinct !DISubprogram(name: "funcLeaf", linkageName: "_Z8funcLeafi", scope: !3, file: !3, line: 48, type: !7, scopeLine: 48, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2)
				!57 = !DILocation(line: 49, column: 9, scope: !58)
				!58 = distinct !DILexicalBlock(scope: !54, file: !3, line: 49, column: 7)
				!59 = !DILocation(line: 49, column: 7, scope: !54)
				!60 = !DILocation(line: 58, column: 14, scope: !61)
				!61 = !DILexicalBlockFile(scope: !62, file: !3, discriminator: 2)
				!62 = distinct !DILexicalBlock(scope: !58, file: !3, line: 56, column: 8)
				!63 = !DILocation(line: 58, column: 5, scope: !61)
				!64 = !DILocation(line: 52, column: 16, scope: !65)
				!65 = distinct !DILexicalBlock(scope: !66, file: !3, line: 51, column: 19)
				!66 = distinct !DILexicalBlock(scope: !58, file: !3, line: 49, column: 14)
				!67 = !DILocation(line: 52, column: 12, scope: !65)
				!68 = !DILocation(line: 52, column: 9, scope: !65)
				!69 = !DILocation(line: 51, column: 14, scope: !70)
				!70 = !DILexicalBlockFile(scope: !66, file: !3, discriminator: 2)
				!71 = !DILocation(line: 51, column: 5, scope: !70)
				!72 = !DILocation(line: 59, column: 16, scope: !73)
				!73 = distinct !DILexicalBlock(scope: !62, file: !3, line: 58, column: 19)
				!74 = !DILocation(line: 59, column: 12, scope: !73)
				!75 = !DILocation(line: 59, column: 9, scope: !73)
				!76 = !DILocation(line: 63, column: 3, scope: !54)
				!77 = distinct !DISubprogram(name: "funcB", linkageName: "_Z3fibi", scope: !3, file: !3, line: 32, type: !7, scopeLine: 32, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2)
				!78 = !DILocation(line: 33, column: 22, scope: !77)

This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Process functions in a top-down order on a dynamic call graph.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 323116

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h

llvm/lib/Transforms/IPO/SampleContextTracker.cpp

llvm/lib/Transforms/IPO/SampleProfile.cpp

llvm/test/Transforms/SampleProfile/Inputs/profile-context-order.prof

llvm/test/Transforms/SampleProfile/Inputs/profile-topdown-order.prof

llvm/test/Transforms/SampleProfile/profile-context-order.ll

llvm/test/Transforms/SampleProfile/profile-topdown-order.ll

[CSSPGO] Process functions in a top-down order on a dynamic call graph.
ClosedPublic