This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
2/2
BranchWeightMetadata.rst
-
include/llvm/
-
llvm/
-
IR/
2/2
Function.h
1/1
MDBuilder.h
-
ProfileData/
1/1
SampleProf.h
-
lib/
-
Analysis/
1/12
ModuleSummaryAnalysis.cpp
-
IR/
2
Function.cpp
-
MDBuilder.cpp
-
Verifier.cpp
-
Transforms/IPO/
-
IPO/
2/2
SampleProfile.cpp
-
test/
-
Bitcode/
-
thinlto-function-summary-callgraph-profile-summary.ll
-
Transforms/SampleProfile/
-
SampleProfile/
-
Inputs/
-
import.prof
1/1
import.ll
-
Verifier/
-
function-metadata-bad.ll
-
metadata-function-prof.ll

Differential D30053

Add function importing info from samplepgo profile to the module summary.
ClosedPublic

Authored by danielcdh on Feb 16 2017, 1:29 PM.

Download Raw Diff

Details

Reviewers

tejohnson
mehdi_amini

Commits

rGa60cdd38812d: Add function importing info from samplepgo profile to the module summary.
rL296498: Add function importing info from samplepgo profile to the module summary.

Summary

For SamplePGO, the profile may contain cross-module inline stacks. As we need to make sure the profile annotation happens when all the hot inline stacks are expanded, we need to pass this info to the module importer so that it can import proper functions if necessary. This patch implemented this feature by emitting cross-module targets as part of function entry metadata. In the module-summary phase, the metadata is used to build call edges that points to functions need to be imported.

Diff Detail

Build Status

Buildable 4057
Build 4057: arc lint + arc unit

Event Timeline

danielcdh created this revision.Feb 16 2017, 1:29 PM

Harbormaster completed remote builds in B4052: Diff 88774.Feb 16 2017, 1:29 PM

You need to update: http://llvm.org/docs/BranchWeightMetadata.html

test/Transforms/SampleProfile/import.ll
11	Can you document what is this in the test? And also document what this test is doing itself.

update

mehdi_amini added inline comments.Feb 16 2017, 3:01 PM

lib/Analysis/ModuleSummaryAnalysis.cpp
188	I'm still unsure why we need this side channel to communicate the hotness, while we have above already some infrastructure? Why isn't samplePGO integrate in the general PGO infrastructure?

danielcdh added inline comments.Feb 16 2017, 3:05 PM

lib/Analysis/ModuleSummaryAnalysis.cpp
188	The root cause is the profile annotation mechanism is different. SamplePGO needs to make sure that before profile annotation, the IR resembles the hot portion of the profiling binary. As a result, we need to explicitly inline functions before profile annotation. But in order to inline, we need to first have it imported. I can go into more details with an example if you want. Please let me know.

mehdi_amini added inline comments.Feb 16 2017, 4:01 PM

lib/Analysis/ModuleSummaryAnalysis.cpp
188	I'm still not fond of this "side channel", and I rather have a solution that annotates hot calls in the IR instead, that would be agnostic to the source of information we derive this "hotness" from.

danielcdh added inline comments.Feb 16 2017, 4:49 PM

lib/Analysis/ModuleSummaryAnalysis.cpp
188	I think we already have a side channel for ICP? The issue for marking "hot call" is that it only goes down by 1 level, for the cases where foo_in_a_cc()->bar_in_b_cc()->baz_in_c_cc(), we may only go one level to include bar_in_b_cc if we mark the hot call. But in practice, we need both levels to be imported and inlined.

mehdi_amini added inline comments.Feb 16 2017, 4:51 PM

lib/Analysis/ModuleSummaryAnalysis.cpp
188	I don't see how having one level at a time is an issue. The two calls can be marked hot and considered independently when building the chain.

danielcdh added inline comments.Feb 16 2017, 4:58 PM

lib/Analysis/ModuleSummaryAnalysis.cpp
188	The issue is, when compiling bar_in_b_cc(), the profile for foo_in_a_cc() is not visible. If the inline instance is the only instance for bar_in_b_cc, then the standalone copy of bar_in_b_cc may not have profile and is considered cold and thus may not be able to mark the callsite to baz_in_c_cc as hot.

mehdi_amini added inline comments.Feb 16 2017, 5:04 PM

lib/Analysis/ModuleSummaryAnalysis.cpp
188	OK it is getting more clear: it is interesting because it means we have some sort of "context sensitivity" : we're not recording that `bar_in_b_cc()->baz_in_c_cc()` is hot in the absolute, but only when called from `foo_in_a_cc()`. It is still not great to add edges in the call graph to model this. You not recording the information described above IIUC, but you're recording a "hot call" from `foo_in_a_cc()` to `bar_in_b_cc()`. I'd have to think about possible unintended consequences from what it means for the summary representation and the analyses.

danielcdh added inline comments.Feb 16 2017, 5:12 PM

lib/Analysis/ModuleSummaryAnalysis.cpp
188	Yeah, context-sensitivity is one big benefit of samplepgo, and it gets even better when comes to iterative compilation: more iteration will introduce richer context, and in later iterations, the real inline pass will be a noop, and samplepgo will perform perfect top-down inlining to capture all possible contexts. Please let me know if you can think of better ways to force importing functions by simply looking at the profile. One thing I can think of is to pass the profile to thin_link, but it seems to add much complexity.

Is it better to introduce a new meta data for this, e.g. MD_inline_instance_imports ? Can this information be directly communicated to function importer instead of relying on modifying Callgraph/profile data?

In D30053#679264, @davidxl wrote:

Is it better to introduce a new meta data for this, e.g. MD_inline_instance_imports ? Can this information be directly communicated to function importer instead of relying on modifying Callgraph/profile data?

We need to expose this to the thin-link for the distributed mode, otherwise the dependent files the function importer would want to import from may not be available. So the summary-based importing analysis will need to be aware of this. Right now the solution is to trick it like we do for ICP by adding "fake" edges. The difference is that for ICP we just make indirect edges direct, but we don't really cheat on the graph structure otherwise. Here we'd create "short-circuit" edges (adding an edge foo->bar when in fact the reality is foo->baz->bar).

In D30053#679264, @davidxl wrote:

Is it better to introduce a new meta data for this, e.g. MD_inline_instance_imports ?

I'm fine with that. But the function definition only allows one !prof metadata. So MD_inline_instance_imports can not use !prof, what should it use?

Can this information be directly communicated to function importer instead of relying on modifying Callgraph/profile data?

It needs to be part of module summary. I suppose function importer cannot have access to IR?

In D30053#679300, @mehdi_amini wrote:

In D30053#679264, @davidxl wrote:

Is it better to introduce a new meta data for this, e.g. MD_inline_instance_imports ? Can this information be directly communicated to function importer instead of relying on modifying Callgraph/profile data?

We need to expose this to the thin-link for the distributed mode, otherwise the dependent files the function importer would want to import from may not be available. So the summary-based importing analysis will need to be aware of this. Right now the solution is to trick it like we do for ICP by adding "fake" edges. The difference is that for ICP we just make indirect edges direct, but we don't really cheat on the graph structure otherwise. Here we'd create "short-circuit" edges (adding an edge foo->bar when in fact the reality is foo->baz->bar).

It isn't just for the distributed mode - the Thin Link needs to know this information so that it can appropriately mark references as exported . We need to make all importing decisions in the Thin Link not the ThinLTO backends, so it needs to be in the summary in some form. Adding the direct edges for the transitive edges seemed like the cleanest way to do this as it doesn't require any changes to the function importing decisions during the thin link, rather than adding a new mechanism.

docs/BranchWeightMetadata.rst
144	Think this needs more explanation with a small example like the one you gave in the review discussion.
include/llvm/IR/Function.h
212	add something like "for sample PGO, to enable the same inlines as the profiled optimized binary".
222	ditto
include/llvm/IR/MDBuilder.h
70	Document the new parameter.
include/llvm/ProfileData/SampleProf.h
309	Since this isn't actually doing any importing, I think it would be clearer to rename to something like "findImportedFunctions" (i.e. it is identifying functions that were presumably imported and inlined in the profiled binary).
lib/Analysis/ModuleSummaryAnalysis.cpp
186	Needs comment
188	It is still not great to add edges in the call graph to model this. You not recording the information described above IIUC, but you're recording a "hot call" from foo_in_a_cc() to bar_in_b_cc(). I'd have to think about possible unintended consequences from what it means for the summary representation and the analyses. In the case Dehao gave, which is foo_in_a_cc()->bar_in_b_cc()->baz_in_c_cc(), we presumably already have an edge in the call graph from foo_in_a_cc to bar_in_b_cc, this is just forcing that edge to be "hot" to force the import. The added edge that wouldn't have been there before would be foo_in_a_cc to baz_in_c_cc, and there we are adding a direct edge for what is a transitive dependence. I don't think this should cause any issues, since the other things we use the edges for are things like liveness and internalization, which aren't changed by adding a direct edge for the transitive dependence.
lib/IR/Function.cpp
1305	When would we have the metadata but not the operand?
lib/Transforms/IPO/SampleProfile.cpp
610	Document new parameter
1285	note that this is also recording the GUIDs that need to be imported for the IR to match

update

lib/IR/Function.cpp
1305	Shouldn't happen, Removed the 2nd check.

In D30053#687441, @tejohnson wrote:

In D30053#679300, @mehdi_amini wrote:

In D30053#679264, @davidxl wrote:

Is it better to introduce a new meta data for this, e.g. MD_inline_instance_imports ? Can this information be directly communicated to function importer instead of relying on modifying Callgraph/profile data?

We need to expose this to the thin-link for the distributed mode, otherwise the dependent files the function importer would want to import from may not be available. So the summary-based importing analysis will need to be aware of this. Right now the solution is to trick it like we do for ICP by adding "fake" edges. The difference is that for ICP we just make indirect edges direct, but we don't really cheat on the graph structure otherwise. Here we'd create "short-circuit" edges (adding an edge foo->bar when in fact the reality is foo->baz->bar).

It isn't just for the distributed mode - the Thin Link needs to know this information so that it can appropriately mark references as exported . We need to make all importing decisions in the Thin Link not the ThinLTO backends,

In practice, one could still import in the backend based on IR information if it wasn't for the distributed mode (and because I've been strongly against doing this anyway for layering reason). That's why I talked about the distributed mode to simplify :)

Adding the direct edges for the transitive edges seemed like the cleanest way to do this as it doesn't require any changes to the function importing decisions during the thin link, rather than adding a new mechanism.

I'm not convinced: while I agree this is the least intrusive way to implement this functionality, I don't find it "clean": we're producing a graph that is less accurate than before.

lib/Analysis/ModuleSummaryAnalysis.cpp
188	we are adding a direct edge for what is a transitive dependence. Right. I don't think this should cause any issues, since the other things we use the edges for are things like liveness and internalization, which aren't changed by adding a direct edge for the transitive dependence. I don't see any issue right now, but that still means that our call-graph is not longer accurate, and that makes me worried about long-term consequences and what kind of analysis or heuristic could be affected by this. I'm concerned about things like that where we break some invariant of a data structure or a component and while it seems fine on the moment, it may bite in the future. That said the alternative (adding an explicit handling for this) seems overkill at this point, so that's fine with me.

LGTM

docs/BranchWeightMetadata.rst
144	I would explicitly add that it is needed because the sampling based profile was collected on a binary that had already imported and inlined these functions, and we need to ensure the IR matches in the ThinLTO backends for profile annotation.
lib/Analysis/ModuleSummaryAnalysis.cpp
188	Agree that we generally want the call graph to be accurate. I think there shouldn't ever be a correctness issue though when the call graph is more "conservative" as it will be here, with the additional edges. Note that in some sense the edges added for indirect calls can be conservative too - we may not actually call those same functions if the input changes.

This revision is now accepted and ready to land.Feb 28 2017, 6:43 AM

update

danielcdh closed this revision.Feb 28 2017, 10:21 AM

Revision Contents

Path

Size

docs/

BranchWeightMetadata.rst

14 lines

include/

llvm/

IR/

Function.h

10 lines

MDBuilder.h

5 lines

ProfileData/

SampleProf.h

17 lines

lib/

Analysis/

ModuleSummaryAnalysis.cpp

3 lines

IR/

Function.cpp

18 lines

MDBuilder.cpp

13 lines

Verifier.cpp

4 lines

Transforms/

IPO/

SampleProfile.cpp

23 lines

test/

Bitcode/

thinlto-function-summary-callgraph-profile-summary.ll

5 lines

Transforms/

SampleProfile/

Inputs/

import.prof

4 lines

import.ll

31 lines

Verifier/

function-metadata-bad.ll

2 lines

metadata-function-prof.ll

2 lines

Diff 88790

docs/BranchWeightMetadata.rst

	Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines

	Function Entry Counts			Function Entry Counts
	=====================			=====================

	To allow comparing different functions during inter-procedural analysis and			To allow comparing different functions during inter-procedural analysis and
	optimization, ``MD_prof`` nodes can also be assigned to a function definition.			optimization, ``MD_prof`` nodes can also be assigned to a function definition.
	The first operand is a string indicating the name of the associated counter.			The first operand is a string indicating the name of the associated counter.

	Currently, one counter is supported: "function_entry_count". This is a 64-bit			Currently, one counter is supported: "function_entry_count". The second operand
	counter that indicates the number of times that this function was invoked (in			is a 64-bit counter that indicates the number of times that this function was
	the case of instrumentation-based profiles). In the case of sampling-based			invoked (in the case of instrumentation-based profiles). In the case of
	profiles, this counter is an approximation of how many times the function was			sampling-based profiles, this operand is an approximation of how many times
	invoked.			the function was invoked.

	For example, in the code below, the instrumentation for function foo()			For example, in the code below, the instrumentation for function foo()
	indicates that it was called 2,590 times at runtime.			indicates that it was called 2,590 times at runtime.

	.. code-block:: llvm			.. code-block:: llvm

	define i32 @foo() !prof !1 {			define i32 @foo() !prof !1 {
	ret i32 0			ret i32 0
	}			}
	!1 = !{!"function_entry_count", i64 2590}			!1 = !{!"function_entry_count", i64 2590}

				If "function_entry_count" has more than 2 operands, the later operands are
				the GUID of the functions that needs to be imported by ThinLTO. This is only
				set by sampling based profile.
				tejohnsonUnsubmitted Done Reply Inline Actions Think this needs more explanation with a small example like the one you gave in the review discussion. tejohnson: Think this needs more explanation with a small example like the one you gave in the review…
				tejohnsonUnsubmitted Done Reply Inline Actions I would explicitly add that it is needed because the sampling based profile was collected on a binary that had already imported and inlined these functions, and we need to ensure the IR matches in the ThinLTO backends for profile annotation. tejohnson: I would explicitly add that it is needed because the sampling based profile was collected on a…

include/llvm/IR/Function.h

Show All 12 Lines
// A function basically consists of a list of basic blocks, a list of arguments,		// A function basically consists of a list of basic blocks, a list of arguments,
// and a symbol table.		// and a symbol table.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_IR_FUNCTION_H		#ifndef LLVM_IR_FUNCTION_H
#define LLVM_IR_FUNCTION_H		#define LLVM_IR_FUNCTION_H

		#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/ilist_node.h"		#include "llvm/ADT/ilist_node.h"
#include "llvm/ADT/iterator_range.h"		#include "llvm/ADT/iterator_range.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CallingConv.h"		#include "llvm/IR/CallingConv.h"
#include "llvm/IR/GlobalObject.h"		#include "llvm/IR/GlobalObject.h"
▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	public:
void removeFnAttr(StringRef Kind) {		void removeFnAttr(StringRef Kind) {
setAttributes(AttributeSets.removeAttribute(		setAttributes(AttributeSets.removeAttribute(
getContext(), AttributeSet::FunctionIndex, Kind));		getContext(), AttributeSet::FunctionIndex, Kind));
}		}

/// \brief Set the entry count for this function.		/// \brief Set the entry count for this function.
///		///
/// Entry count is the number of times this function was executed based on		/// Entry count is the number of times this function was executed based on
/// pgo data.		/// pgo data. \p Imports points to a set of GUIDs that needs to be imported
void setEntryCount(uint64_t Count);		/// by the function.
		tejohnsonUnsubmitted Done Reply Inline Actions add something like "for sample PGO, to enable the same inlines as the profiled optimized binary". tejohnson: add something like "for sample PGO, to enable the same inlines as the profiled optimized…
		void setEntryCount(uint64_t Count,
		const DenseSet<GlobalValue::GUID> *Imports = nullptr);

/// \brief Get the entry count for this function.		/// \brief Get the entry count for this function.
///		///
/// Entry count is the number of times the function was executed based on		/// Entry count is the number of times the function was executed based on
/// pgo data.		/// pgo data.
Optional<uint64_t> getEntryCount() const;		Optional<uint64_t> getEntryCount() const;

		/// Returns the set of GUIDs that needs to be imported to the function.
		tejohnsonUnsubmitted Done Reply Inline Actions ditto tejohnson: ditto
		DenseSet<GlobalValue::GUID> getImportGUIDs() const;

/// Set the section prefix for this function.		/// Set the section prefix for this function.
void setSectionPrefix(StringRef Prefix);		void setSectionPrefix(StringRef Prefix);

/// Get the section prefix for this function.		/// Get the section prefix for this function.
Optional<StringRef> getSectionPrefix() const;		Optional<StringRef> getSectionPrefix() const;

/// @brief Return true if the function has the attribute.		/// @brief Return true if the function has the attribute.
bool hasFnAttribute(Attribute::AttrKind Kind) const {		bool hasFnAttribute(Attribute::AttrKind Kind) const {
▲ Show 20 Lines • Show All 470 Lines • Show Last 20 Lines

include/llvm/IR/MDBuilder.h

Show All 9 Lines
// This file defines the MDBuilder class, which is used as a convenient way to		// This file defines the MDBuilder class, which is used as a convenient way to
// create LLVM metadata with a consistent and simplified interface.		// create LLVM metadata with a consistent and simplified interface.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_IR_MDBUILDER_H		#ifndef LLVM_IR_MDBUILDER_H
#define LLVM_IR_MDBUILDER_H		#define LLVM_IR_MDBUILDER_H

		#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
		#include "llvm/IR/GlobalValue.h"
#include "llvm/Support/DataTypes.h"		#include "llvm/Support/DataTypes.h"
#include <utility>		#include <utility>

namespace llvm {		namespace llvm {

class APInt;		class APInt;
template <typename T> class ArrayRef;		template <typename T> class ArrayRef;
class LLVMContext;		class LLVMContext;
Show All 32 Lines	public:

/// \brief Return metadata containing a number of branch weights.		/// \brief Return metadata containing a number of branch weights.
MDNode *createBranchWeights(ArrayRef<uint32_t> Weights);		MDNode *createBranchWeights(ArrayRef<uint32_t> Weights);

/// Return metadata specifying that a branch or switch is unpredictable.		/// Return metadata specifying that a branch or switch is unpredictable.
MDNode *createUnpredictable();		MDNode *createUnpredictable();

/// Return metadata containing the entry count for a function.		/// Return metadata containing the entry count for a function.
MDNode *createFunctionEntryCount(uint64_t Count);		MDNode *createFunctionEntryCount(uint64_t Count,
		const DenseSet<GlobalValue::GUID> *Imports);
		tejohnsonUnsubmitted Done Reply Inline Actions Document the new parameter. tejohnson: Document the new parameter.

/// Return metadata containing the section prefix for a function.		/// Return metadata containing the section prefix for a function.
MDNode *createFunctionSectionPrefix(StringRef Prefix);		MDNode *createFunctionSectionPrefix(StringRef Prefix);

//===------------------------------------------------------------------===//		//===------------------------------------------------------------------===//
// Range metadata.		// Range metadata.
//===------------------------------------------------------------------===//		//===------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

include/llvm/ProfileData/SampleProf.h

Show All 9 Lines
// This file contains common definitions used in the reading and writing of		// This file contains common definitions used in the reading and writing of
// sample profile data.		// sample profile data.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_PROFILEDATA_SAMPLEPROF_H_		#ifndef LLVM_PROFILEDATA_SAMPLEPROF_H_
#define LLVM_PROFILEDATA_SAMPLEPROF_H_		#define LLVM_PROFILEDATA_SAMPLEPROF_H_

		#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringMap.h"		#include "llvm/ADT/StringMap.h"
		#include "llvm/IR/GlobalValue.h"
		#include "llvm/IR/Module.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorOr.h"		#include "llvm/Support/ErrorOr.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

#include <map>		#include <map>
#include <system_error>		#include <system_error>

namespace llvm {		namespace llvm {
▲ Show 20 Lines • Show All 267 Lines • ▼ Show 20 Lines	sampleprof_error merge(const FunctionSamples &Other, uint64_t Weight = 1) {
for (const auto &I : Other.getCallsiteSamples()) {		for (const auto &I : Other.getCallsiteSamples()) {
const LineLocation &Loc = I.first;		const LineLocation &Loc = I.first;
const FunctionSamples &Rec = I.second;		const FunctionSamples &Rec = I.second;
MergeResult(Result, functionSamplesAt(Loc).merge(Rec, Weight));		MergeResult(Result, functionSamplesAt(Loc).merge(Rec, Weight));
}		}
return Result;		return Result;
}		}

		/// Recursively traverses all children, if the corresponding function is
		/// not defined in module \p M, and its total sample is no less than
		/// \p Threshold, add its corresponding GUID to \p S.
		void importAllFunctions(DenseSet<GlobalValue::GUID> &S, const Module *M,
		tejohnsonUnsubmitted Done Reply Inline Actions Since this isn't actually doing any importing, I think it would be clearer to rename to something like "findImportedFunctions" (i.e. it is identifying functions that were presumably imported and inlined in the profiled binary). tejohnson: Since this isn't actually doing any importing, I think it would be clearer to rename to…
		uint64_t Threshold) const {
		if (TotalSamples <= Threshold)
		return;
		Function *F = M->getFunction(Name);
		if (!F \|\| !F->getSubprogram())
		S.insert(Function::getGUID(Name));
		for (auto CS : CallsiteSamples)
		CS.second.importAllFunctions(S, M, Threshold);
		}

/// Set the name of the function.		/// Set the name of the function.
void setName(StringRef FunctionName) { Name = FunctionName; }		void setName(StringRef FunctionName) { Name = FunctionName; }

/// Return the function name.		/// Return the function name.
const StringRef &getName() const { return Name; }		const StringRef &getName() const { return Name; }

private:		private:
/// Mangled name of the function.		/// Mangled name of the function.
▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

lib/Analysis/ModuleSummaryAnalysis.cpp

Show First 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	for (const Instruction &I : BB) {
ICallAnalysis.getPromotionCandidatesForInstruction(		ICallAnalysis.getPromotionCandidatesForInstruction(
&I, NumVals, TotalCount, NumCandidates);		&I, NumVals, TotalCount, NumCandidates);
for (auto &Candidate : CandidateProfileData)		for (auto &Candidate : CandidateProfileData)
CallGraphEdges[Candidate.Value].updateHotness(		CallGraphEdges[Candidate.Value].updateHotness(
getHotness(Candidate.Count, PSI));		getHotness(Candidate.Count, PSI));
}		}
}		}

		for (auto &I : F.getImportGUIDs())
		tejohnsonUnsubmitted Done Reply Inline Actions Needs comment tejohnson: Needs comment
		CallGraphEdges[I].updateHotness(CalleeInfo::HotnessType::Hot);

		mehdi_aminiUnsubmitted Not Done Reply Inline Actions I'm still unsure why we need this side channel to communicate the hotness, while we have above already some infrastructure? Why isn't samplePGO integrate in the general PGO infrastructure? mehdi_amini: I'm still unsure why we need this side channel to communicate the hotness, while we have above…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions The root cause is the profile annotation mechanism is different. SamplePGO needs to make sure that before profile annotation, the IR resembles the hot portion of the profiling binary. As a result, we need to explicitly inline functions before profile annotation. But in order to inline, we need to first have it imported. I can go into more details with an example if you want. Please let me know. danielcdh: The root cause is the profile annotation mechanism is different. SamplePGO needs to make sure…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions I'm still not fond of this "side channel", and I rather have a solution that annotates hot calls in the IR instead, that would be agnostic to the source of information we derive this "hotness" from. mehdi_amini: I'm still not fond of this "side channel", and I rather have a solution that annotates hot…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions I think we already have a side channel for ICP? The issue for marking "hot call" is that it only goes down by 1 level, for the cases where foo_in_a_cc()->bar_in_b_cc()->baz_in_c_cc(), we may only go one level to include bar_in_b_cc if we mark the hot call. But in practice, we need both levels to be imported and inlined. danielcdh: I think we already have a side channel for ICP? The issue for marking "hot call" is that it…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions I don't see how having one level at a time is an issue. The two calls can be marked hot and considered independently when building the chain. mehdi_amini: I don't see how having one level at a time is an issue. The two calls can be marked hot and…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions The issue is, when compiling bar_in_b_cc(), the profile for foo_in_a_cc() is not visible. If the inline instance is the only instance for bar_in_b_cc, then the standalone copy of bar_in_b_cc may not have profile and is considered cold and thus may not be able to mark the callsite to baz_in_c_cc as hot. danielcdh: The issue is, when compiling bar_in_b_cc(), the profile for foo_in_a_cc() is not visible. If…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions OK it is getting more clear: it is interesting because it means we have some sort of "context sensitivity" : we're not recording that `bar_in_b_cc()->baz_in_c_cc()` is hot in the absolute, but only when called from `foo_in_a_cc()`. It is still not great to add edges in the call graph to model this. You not recording the information described above IIUC, but you're recording a "hot call" from `foo_in_a_cc()` to `bar_in_b_cc()`. I'd have to think about possible unintended consequences from what it means for the summary representation and the analyses. mehdi_amini: OK it is getting more clear: it is interesting because it means we have some sort of "context…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions Yeah, context-sensitivity is one big benefit of samplepgo, and it gets even better when comes to iterative compilation: more iteration will introduce richer context, and in later iterations, the real inline pass will be a noop, and samplepgo will perform perfect top-down inlining to capture all possible contexts. Please let me know if you can think of better ways to force importing functions by simply looking at the profile. One thing I can think of is to pass the profile to thin_link, but it seems to add much complexity. danielcdh: Yeah, context-sensitivity is one big benefit of samplepgo, and it gets even better when comes…
		tejohnsonUnsubmitted Not Done Reply Inline Actions It is still not great to add edges in the call graph to model this. You not recording the information described above IIUC, but you're recording a "hot call" from foo_in_a_cc() to bar_in_b_cc(). I'd have to think about possible unintended consequences from what it means for the summary representation and the analyses. In the case Dehao gave, which is foo_in_a_cc()->bar_in_b_cc()->baz_in_c_cc(), we presumably already have an edge in the call graph from foo_in_a_cc to bar_in_b_cc, this is just forcing that edge to be "hot" to force the import. The added edge that wouldn't have been there before would be foo_in_a_cc to baz_in_c_cc, and there we are adding a direct edge for what is a transitive dependence. I don't think this should cause any issues, since the other things we use the edges for are things like liveness and internalization, which aren't changed by adding a direct edge for the transitive dependence. tejohnson: > It is still not great to add edges in the call graph to model this. You not recording the…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions we are adding a direct edge for what is a transitive dependence. Right. I don't think this should cause any issues, since the other things we use the edges for are things like liveness and internalization, which aren't changed by adding a direct edge for the transitive dependence. I don't see any issue right now, but that still means that our call-graph is not longer accurate, and that makes me worried about long-term consequences and what kind of analysis or heuristic could be affected by this. I'm concerned about things like that where we break some invariant of a data structure or a component and while it seems fine on the moment, it may bite in the future. That said the alternative (adding an explicit handling for this) seems overkill at this point, so that's fine with me. mehdi_amini: > we are adding a direct edge for what is a transitive dependence. Right. > I don't think…
		tejohnsonUnsubmitted Not Done Reply Inline Actions Agree that we generally want the call graph to be accurate. I think there shouldn't ever be a correctness issue though when the call graph is more "conservative" as it will be here, with the additional edges. Note that in some sense the edges added for indirect calls can be conservative too - we may not actually call those same functions if the input changes. tejohnson: Agree that we generally want the call graph to be accurate. I think there shouldn't ever be a…
bool NonRenamableLocal = isNonRenamableLocal(F);		bool NonRenamableLocal = isNonRenamableLocal(F);
bool NotEligibleForImport =		bool NotEligibleForImport =
NonRenamableLocal \|\| HasInlineAsmMaybeReferencingInternal \|\|		NonRenamableLocal \|\| HasInlineAsmMaybeReferencingInternal \|\|
// Inliner doesn't handle variadic functions.		// Inliner doesn't handle variadic functions.
// FIXME: refactor this to use the same code that inliner is using.		// FIXME: refactor this to use the same code that inliner is using.
F.isVarArg();		F.isVarArg();
GlobalValueSummary::GVFlags Flags(F.getLinkage(), NotEligibleForImport,		GlobalValueSummary::GVFlags Flags(F.getLinkage(), NotEligibleForImport,
/* LiveRoot = */ false);		/* LiveRoot = */ false);
▲ Show 20 Lines • Show All 250 Lines • Show Last 20 Lines

lib/IR/Function.cpp

	Show First 20 Lines • Show All 1,273 Lines • ▼ Show 20 Lines
	void Function::setValueSubclassDataBit(unsigned Bit, bool On) {			void Function::setValueSubclassDataBit(unsigned Bit, bool On) {
	assert(Bit < 16 && "SubclassData contains only 16 bits");			assert(Bit < 16 && "SubclassData contains only 16 bits");
	if (On)			if (On)
	setValueSubclassData(getSubclassDataFromValue() \| (1 << Bit));			setValueSubclassData(getSubclassDataFromValue() \| (1 << Bit));
	else			else
	setValueSubclassData(getSubclassDataFromValue() & ~(1 << Bit));			setValueSubclassData(getSubclassDataFromValue() & ~(1 << Bit));
	}			}

	void Function::setEntryCount(uint64_t Count) {			void Function::setEntryCount(uint64_t Count,
				const DenseSet<GlobalValue::GUID> *S) {
	MDBuilder MDB(getContext());			MDBuilder MDB(getContext());
	setMetadata(LLVMContext::MD_prof, MDB.createFunctionEntryCount(Count));			setMetadata(LLVMContext::MD_prof, MDB.createFunctionEntryCount(Count, S));
	}			}

	Optional<uint64_t> Function::getEntryCount() const {			Optional<uint64_t> Function::getEntryCount() const {
	MDNode *MD = getMetadata(LLVMContext::MD_prof);			MDNode *MD = getMetadata(LLVMContext::MD_prof);
	if (MD && MD->getOperand(0))			if (MD && MD->getOperand(0))
	if (MDString *MDS = dyn_cast<MDString>(MD->getOperand(0)))			if (MDString *MDS = dyn_cast<MDString>(MD->getOperand(0)))
	if (MDS->getString().equals("function_entry_count")) {			if (MDS->getString().equals("function_entry_count")) {
	ConstantInt *CI = mdconst::extract<ConstantInt>(MD->getOperand(1));			ConstantInt *CI = mdconst::extract<ConstantInt>(MD->getOperand(1));
	uint64_t Count = CI->getValue().getZExtValue();			uint64_t Count = CI->getValue().getZExtValue();
	if (Count == 0)			if (Count == 0)
	return None;			return None;
	return Count;			return Count;
	}			}
	return None;			return None;
	}			}

				DenseSet<GlobalValue::GUID> Function::getImportGUIDs() const {
				DenseSet<GlobalValue::GUID> R;
				MDNode *MD = getMetadata(LLVMContext::MD_prof);
				if (MD && MD->getOperand(0))
				tejohnsonUnsubmitted Not Done Reply Inline Actions When would we have the metadata but not the operand? tejohnson: When would we have the metadata but not the operand?
				danielcdhAuthorUnsubmitted Not Done Reply Inline Actions Shouldn't happen, Removed the 2nd check. danielcdh: Shouldn't happen, Removed the 2nd check.
				if (MDString *MDS = dyn_cast<MDString>(MD->getOperand(0)))
				if (MDS->getString().equals("function_entry_count"))
				for (unsigned i = 2; i < MD->getNumOperands(); i++)
				R.insert(mdconst::extract<ConstantInt>(MD->getOperand(i))
				->getValue()
				.getZExtValue());
				return R;
				}

	void Function::setSectionPrefix(StringRef Prefix) {			void Function::setSectionPrefix(StringRef Prefix) {
	MDBuilder MDB(getContext());			MDBuilder MDB(getContext());
	setMetadata(LLVMContext::MD_section_prefix,			setMetadata(LLVMContext::MD_section_prefix,
	MDB.createFunctionSectionPrefix(Prefix));			MDB.createFunctionSectionPrefix(Prefix));
	}			}

	Optional<StringRef> Function::getSectionPrefix() const {			Optional<StringRef> Function::getSectionPrefix() const {
	if (MDNode *MD = getMetadata(LLVMContext::MD_section_prefix)) {			if (MDNode *MD = getMetadata(LLVMContext::MD_section_prefix)) {
	assert(dyn_cast<MDString>(MD->getOperand(0))			assert(dyn_cast<MDString>(MD->getOperand(0))
	->getString()			->getString()
	.equals("function_section_prefix") &&			.equals("function_section_prefix") &&
	"Metadata not match");			"Metadata not match");
	return dyn_cast<MDString>(MD->getOperand(1))->getString();			return dyn_cast<MDString>(MD->getOperand(1))->getString();
	}			}
	return None;			return None;
	}			}

lib/IR/MDBuilder.cpp

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	MDNode *MDBuilder::createBranchWeights(ArrayRef<uint32_t> Weights) {

return MDNode::get(Context, Vals);		return MDNode::get(Context, Vals);
}		}

MDNode *MDBuilder::createUnpredictable() {		MDNode *MDBuilder::createUnpredictable() {
return MDNode::get(Context, None);		return MDNode::get(Context, None);
}		}

MDNode *MDBuilder::createFunctionEntryCount(uint64_t Count) {		MDNode *MDBuilder::createFunctionEntryCount(
		uint64_t Count, const DenseSet<GlobalValue::GUID> *Imports) {
Type *Int64Ty = Type::getInt64Ty(Context);		Type *Int64Ty = Type::getInt64Ty(Context);
return MDNode::get(Context,		SmallVector<Metadata *, 8> Ops;
{createString("function_entry_count"),		Ops.push_back(createString("function_entry_count"));
createConstant(ConstantInt::get(Int64Ty, Count))});		Ops.push_back(createConstant(ConstantInt::get(Int64Ty, Count)));
		if (Imports)
		for (auto ID : *Imports)
		Ops.push_back(createConstant(ConstantInt::get(Int64Ty, ID)));
		return MDNode::get(Context, Ops);
}		}

MDNode *MDBuilder::createFunctionSectionPrefix(StringRef Prefix) {		MDNode *MDBuilder::createFunctionSectionPrefix(StringRef Prefix) {
return MDNode::get(Context,		return MDNode::get(Context,
{createString("function_section_prefix"),		{createString("function_section_prefix"),
createString(Prefix)});		createString(Prefix)});
}		}

▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

lib/IR/Verifier.cpp

Show First 20 Lines • Show All 1,644 Lines • ▼ Show 20 Lines	void Verifier::verifyFunctionAttrs(FunctionType *FT, AttributeSet Attrs,
}		}
}		}

void Verifier::verifyFunctionMetadata(		void Verifier::verifyFunctionMetadata(
ArrayRef<std::pair<unsigned, MDNode *>> MDs) {		ArrayRef<std::pair<unsigned, MDNode *>> MDs) {
for (const auto &Pair : MDs) {		for (const auto &Pair : MDs) {
if (Pair.first == LLVMContext::MD_prof) {		if (Pair.first == LLVMContext::MD_prof) {
MDNode *MD = Pair.second;		MDNode *MD = Pair.second;
Assert(MD->getNumOperands() == 2,		Assert(MD->getNumOperands() >= 2,
"!prof annotations should have exactly 2 operands", MD);		"!prof annotations should have no less than 2 operands", MD);

// Check first operand.		// Check first operand.
Assert(MD->getOperand(0) != nullptr, "first operand should not be null",		Assert(MD->getOperand(0) != nullptr, "first operand should not be null",
MD);		MD);
Assert(isa<MDString>(MD->getOperand(0)),		Assert(isa<MDString>(MD->getOperand(0)),
"expected string with name of the !prof annotation", MD);		"expected string with name of the !prof annotation", MD);
MDString *MDS = cast<MDString>(MD->getOperand(0));		MDString *MDS = cast<MDString>(MD->getOperand(0));
StringRef ProfName = MDS->getString();		StringRef ProfName = MDS->getString();
▲ Show 20 Lines • Show All 3,188 Lines • Show Last 20 Lines

lib/Transforms/IPO/SampleProfile.cpp

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines
protected:		protected:
bool runOnFunction(Function &F);		bool runOnFunction(Function &F);
unsigned getFunctionLoc(Function &F);		unsigned getFunctionLoc(Function &F);
bool emitAnnotations(Function &F);		bool emitAnnotations(Function &F);
ErrorOr<uint64_t> getInstWeight(const Instruction &I);		ErrorOr<uint64_t> getInstWeight(const Instruction &I);
ErrorOr<uint64_t> getBlockWeight(const BasicBlock *BB);		ErrorOr<uint64_t> getBlockWeight(const BasicBlock *BB);
const FunctionSamples *findCalleeFunctionSamples(const Instruction &I) const;		const FunctionSamples *findCalleeFunctionSamples(const Instruction &I) const;
const FunctionSamples *findFunctionSamples(const Instruction &I) const;		const FunctionSamples *findFunctionSamples(const Instruction &I) const;
bool inlineHotFunctions(Function &F);		bool inlineHotFunctions(Function &F,
		DenseSet<GlobalValue::GUID> &ImportGUIDs);
void printEdgeWeight(raw_ostream &OS, Edge E);		void printEdgeWeight(raw_ostream &OS, Edge E);
void printBlockWeight(raw_ostream &OS, const BasicBlock *BB) const;		void printBlockWeight(raw_ostream &OS, const BasicBlock *BB) const;
void printBlockEquivalence(raw_ostream &OS, const BasicBlock *BB);		void printBlockEquivalence(raw_ostream &OS, const BasicBlock *BB);
bool computeBlockWeights(Function &F);		bool computeBlockWeights(Function &F);
void findEquivalenceClasses(Function &F);		void findEquivalenceClasses(Function &F);
void findEquivalencesFor(BasicBlock BB1, ArrayRef<BasicBlock > Descendants,		void findEquivalencesFor(BasicBlock BB1, ArrayRef<BasicBlock > Descendants,
DominatorTreeBase<BasicBlock> *DomTree);		DominatorTreeBase<BasicBlock> *DomTree);
void propagateWeights(Function &F);		void propagateWeights(Function &F);
▲ Show 20 Lines • Show All 425 Lines • ▼ Show 20 Lines
/// the corresponding inlined instance exists and is hot in profile. If		/// the corresponding inlined instance exists and is hot in profile. If
/// it is hot enough, inline the callsites and adds new callsites of the		/// it is hot enough, inline the callsites and adds new callsites of the
/// callee into the caller. If the call is an indirect call, first promote		/// callee into the caller. If the call is an indirect call, first promote
/// it to direct call. Each indirect call is limited with a single target.		/// it to direct call. Each indirect call is limited with a single target.
///		///
/// \param F function to perform iterative inlining.		/// \param F function to perform iterative inlining.
///		///
/// \returns True if there is any inline happened.		/// \returns True if there is any inline happened.
bool SampleProfileLoader::inlineHotFunctions(Function &F) {		bool SampleProfileLoader::inlineHotFunctions(
		Function &F, DenseSet<GlobalValue::GUID> &ImportGUIDs) {
		tejohnsonUnsubmitted Done Reply Inline Actions Document new parameter tejohnson: Document new parameter
DenseSet<Instruction *> PromotedInsns;		DenseSet<Instruction *> PromotedInsns;
bool Changed = false;		bool Changed = false;
LLVMContext &Ctx = F.getContext();		LLVMContext &Ctx = F.getContext();
std::function<AssumptionCache &(Function &)> GetAssumptionCache = [&](		std::function<AssumptionCache &(Function &)> GetAssumptionCache = [&](
Function &F) -> AssumptionCache & { return ACT->getAssumptionCache(F); };		Function &F) -> AssumptionCache & { return ACT->getAssumptionCache(F); };
while (true) {		while (true) {
bool LocalChanged = false;		bool LocalChanged = false;
SmallVector<Instruction *, 10> CIS;		SmallVector<Instruction *, 10> CIS;
Show All 32 Lines	for (auto I : CIS) {
->stripPointerCasts());		->stripPointerCasts());
PromotedInsns.insert(I);		PromotedInsns.insert(I);
} else {		} else {
DEBUG(dbgs() << "\nFailed to promote indirect call to "		DEBUG(dbgs() << "\nFailed to promote indirect call to "
<< CalleeFunctionName << " because " << Reason << "\n");		<< CalleeFunctionName << " because " << Reason << "\n");
continue;		continue;
}		}
}		}
if (!CalledFunction \|\| !CalledFunction->getSubprogram())		if (!CalledFunction \|\| !CalledFunction->getSubprogram()) {
		findCalleeFunctionSamples(*I)->importAllFunctions(
		ImportGUIDs, F.getParent(),
		Samples->getTotalSamples() * SampleProfileHotThreshold / 100);
continue;		continue;
		}
DebugLoc DLoc = I->getDebugLoc();		DebugLoc DLoc = I->getDebugLoc();
uint64_t NumSamples = findCalleeFunctionSamples(*I)->getTotalSamples();		uint64_t NumSamples = findCalleeFunctionSamples(*I)->getTotalSamples();
if (InlineFunction(CallSite(DI), IFI)) {		if (InlineFunction(CallSite(DI), IFI)) {
LocalChanged = true;		LocalChanged = true;
emitOptimizationRemark(Ctx, DEBUG_TYPE, F, DLoc,		emitOptimizationRemark(Ctx, DEBUG_TYPE, F, DLoc,
Twine("inlined hot callee '") +		Twine("inlined hot callee '") +
CalledFunction->getName() + "' with " +		CalledFunction->getName() + "' with " +
Twine(NumSamples) + " samples into '" +		Twine(NumSamples) + " samples into '" +
▲ Show 20 Lines • Show All 368 Lines • ▼ Show 20 Lines
/// - If there is a self-referential edge, and the weight of the block is		/// - If there is a self-referential edge, and the weight of the block is
/// known, the weight for that edge is set to the weight of the block		/// known, the weight for that edge is set to the weight of the block
/// minus the weight of the other incoming edges to that block (if		/// minus the weight of the other incoming edges to that block (if
/// known).		/// known).
void SampleProfileLoader::propagateWeights(Function &F) {		void SampleProfileLoader::propagateWeights(Function &F) {
bool Changed = true;		bool Changed = true;
unsigned I = 0;		unsigned I = 0;

// Add an entry count to the function using the samples gathered
// at the function entry.
F.setEntryCount(Samples->getHeadSamples() + 1);

// If BB weight is larger than its corresponding loop's header BB weight,		// If BB weight is larger than its corresponding loop's header BB weight,
// use the BB weight to replace the loop header BB weight.		// use the BB weight to replace the loop header BB weight.
for (auto &BI : F) {		for (auto &BI : F) {
BasicBlock *BB = &BI;		BasicBlock *BB = &BI;
Loop *L = LI->getLoopFor(BB);		Loop *L = LI->getLoopFor(BB);
if (!L) {		if (!L) {
continue;		continue;
}		}
▲ Show 20 Lines • Show All 212 Lines • ▼ Show 20 Lines	bool SampleProfileLoader::emitAnnotations(Function &F) {
bool Changed = false;		bool Changed = false;

if (getFunctionLoc(F) == 0)		if (getFunctionLoc(F) == 0)
return false;		return false;

DEBUG(dbgs() << "Line number for the first instruction in " << F.getName()		DEBUG(dbgs() << "Line number for the first instruction in " << F.getName()
<< ": " << getFunctionLoc(F) << "\n");		<< ": " << getFunctionLoc(F) << "\n");

Changed \|= inlineHotFunctions(F);		DenseSet<GlobalValue::GUID> ImportGUIDs;
		Changed \|= inlineHotFunctions(F, ImportGUIDs);

// Compute basic block weights.		// Compute basic block weights.
Changed \|= computeBlockWeights(F);		Changed \|= computeBlockWeights(F);

if (Changed) {		if (Changed) {
		// Add an entry count to the function using the samples gathered
		// at the function entry.
		tejohnsonUnsubmitted Done Reply Inline Actions note that this is also recording the GUIDs that need to be imported for the IR to match tejohnson: note that this is also recording the GUIDs that need to be imported for the IR to match
		F.setEntryCount(Samples->getHeadSamples() + 1, &ImportGUIDs);

// Compute dominance and loop info needed for propagation.		// Compute dominance and loop info needed for propagation.
computeDominanceAndLoopInfo(F);		computeDominanceAndLoopInfo(F);

// Find equivalence classes.		// Find equivalence classes.
findEquivalenceClasses(F);		findEquivalenceClasses(F);

// Propagate weights to all edges.		// Propagate weights to all edges.
propagateWeights(F);		propagateWeights(F);
▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

test/Bitcode/thinlto-function-summary-callgraph-profile-summary.ll

	; Test to check the callgraph in summary when there is PGO			; Test to check the callgraph in summary when there is PGO
	; RUN: opt -module-summary %s -o %t.o			; RUN: opt -module-summary %s -o %t.o
	; RUN: llvm-bcanalyzer -dump %t.o \| FileCheck %s			; RUN: llvm-bcanalyzer -dump %t.o \| FileCheck %s
	; RUN: opt -module-summary %p/Inputs/thinlto-function-summary-callgraph-profile-summary.ll -o %t2.o			; RUN: opt -module-summary %p/Inputs/thinlto-function-summary-callgraph-profile-summary.ll -o %t2.o
	; RUN: llvm-lto -thinlto -o %t3 %t.o %t2.o			; RUN: llvm-lto -thinlto -o %t3 %t.o %t2.o
	; RUN: llvm-bcanalyzer -dump %t3.thinlto.bc \| FileCheck %s --check-prefix=COMBINED			; RUN: llvm-bcanalyzer -dump %t3.thinlto.bc \| FileCheck %s --check-prefix=COMBINED


	; CHECK-LABEL: <GLOBALVAL_SUMMARY_BLOCK			; CHECK-LABEL: <GLOBALVAL_SUMMARY_BLOCK
	; CHECK-NEXT: <VERSION			; CHECK-NEXT: <VERSION
	; See if the call to func is registered, using the expected callsite count			; See if the call to func is registered, using the expected callsite count
	; and profile count, with value id matching the subsequent value symbol table.			; and profile count, with value id matching the subsequent value symbol table.
	; CHECK-NEXT: <PERMODULE_PROFILE {{.}} op4=[[HOT1:.]] op5=3 op6=[[COLD:.]] op7=1 op8=[[HOT2:.]] op9=3 op10=[[NONE1:.]] op11=2 op12=[[HOT3:.]] op13=3 op14=[[NONE2:.]] op15=2 op16=[[NONE3:.]] op17=2/>			; CHECK-NEXT: <PERMODULE_PROFILE {{.}} op4=[[HOT1:.]] op5=3 op6=[[COLD:.]] op7=1 op8=[[HOT2:.]] op9=3 op10=[[NONE1:.]] op11=2 op12=[[HOT3:.]] op13=3 op14=[[NONE2:.]] op15=2 op16=[[NONE3:.]] op17=2 op18=[[LEGACY:.*]] op19=3/>
	; CHECK-NEXT: </GLOBALVAL_SUMMARY_BLOCK>			; CHECK-NEXT: </GLOBALVAL_SUMMARY_BLOCK>
	; CHECK-LABEL: <VALUE_SYMTAB			; CHECK-LABEL: <VALUE_SYMTAB
	; CHECK-NEXT: <FNENTRY {{.*}} record string = 'hot_function			; CHECK-NEXT: <FNENTRY {{.*}} record string = 'hot_function
	; CHECK-DAG: <ENTRY abbrevid=6 op0=[[NONE1]] {{.*}} record string = 'none1'			; CHECK-DAG: <ENTRY abbrevid=6 op0=[[NONE1]] {{.*}} record string = 'none1'
	; CHECK-DAG: <ENTRY abbrevid=6 op0=[[COLD]] {{.*}} record string = 'cold'			; CHECK-DAG: <ENTRY abbrevid=6 op0=[[COLD]] {{.*}} record string = 'cold'
	; CHECK-DAG: <ENTRY abbrevid=6 op0=[[NONE2]] {{.*}} record string = 'none2'			; CHECK-DAG: <ENTRY abbrevid=6 op0=[[NONE2]] {{.*}} record string = 'none2'
	; CHECK-DAG: <ENTRY abbrevid=6 op0=[[NONE3]] {{.*}} record string = 'none3'			; CHECK-DAG: <ENTRY abbrevid=6 op0=[[NONE3]] {{.*}} record string = 'none3'
	; CHECK-DAG: <ENTRY abbrevid=6 op0=[[HOT1]] {{.*}} record string = 'hot1'			; CHECK-DAG: <ENTRY abbrevid=6 op0=[[HOT1]] {{.*}} record string = 'hot1'
	; CHECK-DAG: <ENTRY abbrevid=6 op0=[[HOT2]] {{.*}} record string = 'hot2'			; CHECK-DAG: <ENTRY abbrevid=6 op0=[[HOT2]] {{.*}} record string = 'hot2'
	; CHECK-DAG: <ENTRY abbrevid=6 op0=[[HOT3]] {{.*}} record string = 'hot3'			; CHECK-DAG: <ENTRY abbrevid=6 op0=[[HOT3]] {{.*}} record string = 'hot3'
				; CHECK-DAG: <COMBINED_ENTRY abbrevid=11 op0=[[LEGACY]] op1=123/>
	; CHECK-LABEL: </VALUE_SYMTAB>			; CHECK-LABEL: </VALUE_SYMTAB>

	; COMBINED: <GLOBALVAL_SUMMARY_BLOCK			; COMBINED: <GLOBALVAL_SUMMARY_BLOCK
	; COMBINED-NEXT: <VERSION			; COMBINED-NEXT: <VERSION
	; COMBINED-NEXT: <COMBINED abbrevid=			; COMBINED-NEXT: <COMBINED abbrevid=
	; COMBINED-NEXT: <COMBINED abbrevid=			; COMBINED-NEXT: <COMBINED abbrevid=
	; COMBINED-NEXT: <COMBINED abbrevid=			; COMBINED-NEXT: <COMBINED abbrevid=
	; COMBINED-NEXT: <COMBINED abbrevid=			; COMBINED-NEXT: <COMBINED abbrevid=
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines


	!41 = !{!"branch_weights", i32 1, i32 1000}			!41 = !{!"branch_weights", i32 1, i32 1000}
	!42 = !{!"branch_weights", i32 1, i32 1}			!42 = !{!"branch_weights", i32 1, i32 1}



	!llvm.module.flags = !{!1}			!llvm.module.flags = !{!1}
	!20 = !{!"function_entry_count", i64 110}			!20 = !{!"function_entry_count", i64 110, i64 123}

	!1 = !{i32 1, !"ProfileSummary", !2}			!1 = !{i32 1, !"ProfileSummary", !2}
	!2 = !{!3, !4, !5, !6, !7, !8, !9, !10}			!2 = !{!3, !4, !5, !6, !7, !8, !9, !10}
	!3 = !{!"ProfileFormat", !"InstrProf"}			!3 = !{!"ProfileFormat", !"InstrProf"}
	!4 = !{!"TotalCount", i64 10000}			!4 = !{!"TotalCount", i64 10000}
	!5 = !{!"MaxCount", i64 10}			!5 = !{!"MaxCount", i64 10}
	!6 = !{!"MaxInternalCount", i64 1}			!6 = !{!"MaxInternalCount", i64 1}
	!7 = !{!"MaxFunctionCount", i64 1000}			!7 = !{!"MaxFunctionCount", i64 1000}
	!8 = !{!"NumCounts", i64 3}			!8 = !{!"NumCounts", i64 3}
	!9 = !{!"NumFunctions", i64 3}			!9 = !{!"NumFunctions", i64 3}
	!10 = !{!"DetailedSummary", !11}			!10 = !{!"DetailedSummary", !11}
	!11 = !{!12, !13, !14}			!11 = !{!12, !13, !14}
	!12 = !{i32 10000, i64 100, i32 1}			!12 = !{i32 10000, i64 100, i32 1}
	!13 = !{i32 999000, i64 100, i32 1}			!13 = !{i32 999000, i64 100, i32 1}
	!14 = !{i32 999999, i64 1, i32 2}			!14 = !{i32 999999, i64 1, i32 2}

test/Transforms/SampleProfile/Inputs/import.prof

This file was added.

				main:10000:0
				3: foo:1000
				3: bar:200
				4: baz:10

test/Transforms/SampleProfile/import.ll

This file was added.

				; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/import.prof -S \| FileCheck %s

				; Tests whether the functions in the inline stack are added to the
				; function_entry_count metadata.

				declare void @foo()

				define void @main() !dbg !7 {
				call void @foo(), !dbg !18
				ret void
				}
				mehdi_aminiUnsubmitted Done Reply Inline Actions Can you document what is this in the test? And also document what this test is doing itself. mehdi_amini: Can you document what is this in the test? And also document what this test is doing itself.

				; GUIDs of foo and bar should be included in the metadata to make sure hot
				; inline stacks are imported.
				; CHECK: !{!"function_entry_count", i64 1, i64 6699318081062747564, i64 -2012135647395072713}

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!8, !9}
				!llvm.ident = !{!10}

				!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, producer: "clang version 3.5 ", isOptimized: false, emissionKind: NoDebug, file: !1, enums: !2, retainedTypes: !2, globals: !2, imports: !2)
				!1 = !DIFile(filename: "calls.cc", directory: ".")
				!2 = !{}
				!6 = !DISubroutineType(types: !2)
				!7 = distinct !DISubprogram(name: "main", line: 7, isLocal: false, isDefinition: true, virtualIndex: 6, flags: DIFlagPrototyped, isOptimized: false, unit: !0, scopeLine: 7, file: !1, scope: !1, type: !6, variables: !2)
				!8 = !{i32 2, !"Dwarf Version", i32 4}
				!9 = !{i32 1, !"Debug Info Version", i32 3}
				!10 = !{!"clang version 3.5 "}
				!15 = !DILexicalBlockFile(discriminator: 1, file: !1, scope: !7)
				!17 = distinct !DILexicalBlock(line: 10, column: 0, file: !1, scope: !7)
				!18 = !DILocation(line: 10, scope: !17)

test/Verifier/function-metadata-bad.ll

	; RUN: not llvm-as < %s -o /dev/null 2>&1 \| FileCheck %s			; RUN: not llvm-as < %s -o /dev/null 2>&1 \| FileCheck %s

	define i32 @bad1() !prof !0 {			define i32 @bad1() !prof !0 {
	ret i32 0			ret i32 0
	}			}

	!0 = !{i32 123, i32 3}			!0 = !{i32 123, i32 3}
	; CHECK: assembly parsed, but does not verify as correct!			; CHECK: assembly parsed, but does not verify as correct!
	; CHECK-NEXT: expected string with name of the !prof annotation			; CHECK-NEXT: expected string with name of the !prof annotation
	; CHECK-NEXT: !0 = !{i32 123, i32 3}			; CHECK-NEXT: !0 = !{i32 123, i32 3}

	define i32 @bad2() !prof !1 {			define i32 @bad2() !prof !1 {
	ret i32 0			ret i32 0
	}			}

	!1 = !{!"function_entry_count"}			!1 = !{!"function_entry_count"}
	; CHECK-NEXT: !prof annotations should have exactly 2 operands			; CHECK-NEXT: !prof annotations should have no less than 2 operands
	; CHECK-NEXT: !1 = !{!"function_entry_count"}			; CHECK-NEXT: !1 = !{!"function_entry_count"}


	define i32 @bad3() !prof !2 {			define i32 @bad3() !prof !2 {
	ret i32 0			ret i32 0
	}			}

	!2 = !{!"some_other_count", i64 200}			!2 = !{!"some_other_count", i64 200}
	Show All 10 Lines

test/Verifier/metadata-function-prof.ll

	; RUN: not llvm-as %s -disable-output 2>&1 \| FileCheck %s			; RUN: not llvm-as %s -disable-output 2>&1 \| FileCheck %s

	; CHECK: function declaration may not have a !prof attachment			; CHECK: function declaration may not have a !prof attachment
	declare !prof !0 void @f1()			declare !prof !0 void @f1()

	define void @f2() !prof !0 {			define void @f2() !prof !0 {
	unreachable			unreachable
	}			}

	; CHECK: function must have a single !prof attachment			; CHECK: function must have a single !prof attachment
	define void @f3() !prof !0 !prof !0 {			define void @f3() !prof !0 !prof !0 {
	unreachable			unreachable
	}			}

	!0 = !{}			!0 = !{!"function_entry_count", i64 100}

This is an archive of the discontinued LLVM Phabricator instance.

Add function importing info from samplepgo profile to the module summary.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 88790

docs/BranchWeightMetadata.rst

include/llvm/IR/Function.h

include/llvm/IR/MDBuilder.h

include/llvm/ProfileData/SampleProf.h

lib/Analysis/ModuleSummaryAnalysis.cpp

lib/IR/Function.cpp

lib/IR/MDBuilder.cpp

lib/IR/Verifier.cpp

lib/Transforms/IPO/SampleProfile.cpp

test/Bitcode/thinlto-function-summary-callgraph-profile-summary.ll

test/Transforms/SampleProfile/Inputs/import.prof

test/Transforms/SampleProfile/import.ll

test/Verifier/function-metadata-bad.ll

test/Verifier/metadata-function-prof.ll

Add function importing info from samplepgo profile to the module summary.
ClosedPublic