This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/
-
llvm/
-
Analysis/
-
ModuleSummaryAnalysis.h
-
Bitcode/
-
LLVMBitCodes.h
-
IR/
-
ModuleSummaryIndex.h
-
lib/
-
Analysis/
-
ModuleSummaryAnalysis.cpp
-
Bitcode/
-
Reader/
-
BitcodeReader.cpp
-
Writer/
-
BitcodeWriter.cpp
-
LTO/
-
ThinLTOCodeGenerator.cpp
-
Transforms/IPO/
-
IPO/
7
FunctionImport.cpp
-
test/
-
Bitcode/
-
Inputs/
-
thinlto-function-summary-callgraph-combined.1.bc
-
thinlto-function-summary-callgraph-pgo-combined.1.bc
-
thinlto-function-summary-callgraph-pgo.1.bc
-
thinlto-function-summary-callgraph-profile-summary.ll
-
thinlto-function-summary-callgraph.1.bc
-
summary_version.ll
-
thinlto-alias.ll
-
thinlto-function-summary-callgraph-pgo.ll
-
thinlto-function-summary-callgraph-profile-summary.ll
-
thinlto-function-summary-callgraph.ll
-
thinlto-function-summary-refgraph.ll
-
Transforms/FunctionImport/
-
FunctionImport/
-
Inputs/
-
hotness_based_import.ll
-
hotness_based_import.ll

Differential D24638

[thinlto] Basic thinlto fdo heuristic
ClosedPublic

Authored by Prazek on Sep 15 2016, 3:31 PM.

Download Raw Diff

Details

Reviewers

tejohnson
eraman
mehdi_amini

Commits

rGd9830eb79fdc: [thinlto] Basic thinlto fdo heuristic
rL282437: [thinlto] Basic thinlto fdo heuristic

Summary

This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.

I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.

I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.

Implementation details changes:

Removed CallsiteCount.
ProfileCount got replaced by Hotness
hot-import-multiplier is set to 3.0 for now,

didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.

Diff Detail

Repository: rL LLVM

Event Timeline

Prazek updated this revision to Diff 71571.Sep 15 2016, 3:31 PM

Prazek retitled this revision from to [thinlto] Basic thinlto fdo heuristic.

Prazek updated this object.

Prazek added reviewers: tejohnson, eraman, mehdi_amini.

Prazek added a subscriber: llvm-commits.

Herald added a subscriber: mehdi_amini. · View Herald TranscriptSep 15 2016, 3:31 PM

Thanks!

I haven't gone through your new test cases yet, but there are a few comments below so far (one correctness issue with handling old versions).

lib/Analysis/ModuleSummaryAnalysis.cpp
66 ↗	(On Diff #71571)	Add blank line before
lib/Bitcode/Reader/BitcodeReader.cpp
6220 ↗	(On Diff #71571)	Use something like OldProfileFormat to be more explicit. That way the name of this variable won't need changing when we bump the version again.
6303 ↗	(On Diff #71571)	Only skipping 2 fields if HasProfile, otherwise just skip 1 (callsite_count). Can you make sure there is a test that has the old format bitcode (both with and without old profile)? I.e. you would commit old bitcode - there are some existing tests that do this (look at the .bc files committed in test/Bitcode for examples).
6362 ↗	(On Diff #71571)	Record descriptions in include/llvm/Bitcode/LLVMBitCodes.h need to be updated accordingly.
6393 ↗	(On Diff #71571)	Ditto here.
lib/LTO/ThinLTOCodeGenerator.cpp
382 ↗	(On Diff #71571)	The PSI is not too useful without a BFI - perhaps construct one here as we do e.g. in PartialInlinerImpl::unswitchFunction? Ok as a follow-on though. Otherwise change PSI to be passed as an optional pointer.
lib/Transforms/IPO/FunctionImport.cpp
287 ↗	(On Diff #71571)	Add a FIXME about using the Code hotness to reduce threshold

small fixes

Prazek added inline comments.Sep 16 2016, 10:35 AM

lib/Bitcode/Reader/BitcodeReader.cpp
6303 ↗	(On Diff #71571)	good catch. I didn't know how to properly test it :(

Great to see this!

Can you add a test with three functions, hot/cold/unknown, and show the impact of the importing with a fixed -import-instr-limit and varying -import-hot-multiplier?

lib/Analysis/ModuleSummaryAnalysis.cpp
113 ↗	(On Diff #71672)	Can you split the statement: auto Hotness = ScaledCount ? getHotness(ScaledCount.getValue(), PSI) : CalleeInfo::HotnessType::Unknown; CallGraphEdges[CalleeId].updateHotness(Hotness); Same number of lines, but the flow is easier to read.
lib/LTO/ThinLTOCodeGenerator.cpp
382 ↗	(On Diff #71672)	At this point, not having PSI (i.e. making it optional is fine).

In D24638#545076, @mehdi_amini wrote:

Great to see this!

Can you add a test with three functions, hot/cold/unknown, and show the impact of the importing with a fixed -import-instr-limit and varying -import-hot-multiplier?

Good point, I forgot to test this.

lib/Bitcode/Reader/BitcodeReader.cpp
6303 ↗	(On Diff #71672)	Do you have some idea how this test should look like? I was thinking about saving bc files from old version, and then reading them and see what happens. Is there a way to just read bc file and then write it again? I could then compare the output with some existing test.
lib/LTO/ThinLTOCodeGenerator.cpp
382 ↗	(On Diff #71672)	Will look into your example in a bit, but it seems to add additional handling of not having PSI, where PSI already have this logic inside.

Prazek added inline comments.Sep 16 2016, 4:56 PM

lib/Bitcode/Reader/BitcodeReader.cpp
6303 ↗	(On Diff #71672)	I guess this would be good llvm-dis < %s.bc -o - \| llvm-as \| llvm-dis \| FileCheck

mehdi_amini added inline comments.Sep 16 2016, 4:58 PM

lib/Bitcode/Reader/BitcodeReader.cpp
6303 ↗	(On Diff #71672)	This does not read summaries. Try `llvm-lto -thinlto-index-stats %s.bc`

Is it in some recent patch right?

small fixes
Added regression test
something
fixes
added test

tejohnson added inline comments.Sep 23 2016, 10:36 AM

lib/Bitcode/Reader/BitcodeReader.cpp
6457 ↗	(On Diff #72220)	Blank line above (clang-format should fix?), and document
6464 ↗	(On Diff #72220)	I think this block would be clearer if restructured like: if (IsOldProfileFormat) { I += 1; // Skip old callsitecount field if (HasProfile) I += 1; // Skip old profilecount field } else Hotness = static_cast<CalleeInfo::HotnessType>(Record[++I]); ... (Note the ++I change in the Record access too)
lib/LTO/ThinLTOCodeGenerator.cpp
382 ↗	(On Diff #72220)	Not sure I follow. The example was to show how to also create a BFI, so that the PSI is useful. But instead you could follow Mehdi's suggestion and make the PFI argument optional.
lib/Transforms/IPO/FunctionImport.cpp
288 ↗	(On Diff #72220)	s/Fixme/FIXME/
test/Transforms/FunctionImport/hotness_based_import.ll
14 ↗	(On Diff #72220)	Also check that hot3 not imported?
19 ↗	(On Diff #72220)	s/threat/treat/
27 ↗	(On Diff #72220)	hot3?
30 ↗	(On Diff #72220)	s/form/from/
38 ↗	(On Diff #72220)	hot3?
53 ↗	(On Diff #72220)	why is hot2 here?
54 ↗	(On Diff #72220)	ditto for none1

Prazek added inline comments.Sep 23 2016, 12:59 PM

lib/Bitcode/Reader/BitcodeReader.cpp
6464 ↗	(On Diff #72220)	what do you mean with "(Note the ++I change in the Record access too)" I agree that your structure is better.
lib/LTO/ThinLTOCodeGenerator.cpp
382 ↗	(On Diff #72220)	I will look into that, but my point is that PSI already have handling of not having the profile data, so when asked if count is hot/cold it returns always false. changing it to optional will require checking if PSI is set. It won't be big husstle, but I guess it will make it more clean that the PSI does not always returns useful information
test/Transforms/FunctionImport/hotness_based_import.ll
14 ↗	(On Diff #72220)	oh yes
53 ↗	(On Diff #72220)	to check if it takes max of the hotness of block. It is also in hot block.
54 ↗	(On Diff #72220)	same

[thinlto] replace profile count with hotness
fixes

fixes

fixes

LGTM thanks!

lib/Bitcode/Reader/BitcodeReader.cpp
6464 ↗	(On Diff #72220)	what do you mean with "(Note the ++I change in the Record access too)" I just was calling attention to the fact that this line was different than what your old structure required. Looks like you adapted this change fine.

This revision is now accepted and ready to land.Sep 26 2016, 1:25 PM

Closed by commit rL282437: [thinlto] Basic thinlto fdo heuristic (authored by Prazek). · Explain WhySep 26 2016, 1:46 PM

This revision was automatically updated to reflect the committed changes.

I'm not sure this is correct, see inline.

llvm/trunk/lib/Transforms/IPO/FunctionImport.cpp
291	This does not seem correct to me. The multiplier should be apply only on the initial "seed" of the call chain, not at every chain.
368	Here the `ImportInstrLimit` can benefit of the multiplier.

tejohnson added inline comments.Sep 27 2016, 1:53 PM

llvm/trunk/lib/Transforms/IPO/FunctionImport.cpp
291	The multiplier is applied to each call independently based on whether it is marked hot. E.g. if we have: A -> B (hot call) and then B has 2 calls: B -> C1 (hot call) and B -> C2 (cold call) I assume you are referring to A as the "seed" of the call chain? We want to treat the two different calls from B differently, it doesn't matter that the call from A -> B is hot.
368	No since the calls from FuncSummary will be treated appropriately if they are hot.

Aggree what Teresa said. In other example:

A->B->C
where B and C are external, and C is called from hot block and B is called from normal block.

Threshold for B will be 100, and for C will be 70*3 (100*0.7*bonus)

mehdi_amini added inline comments.Sep 27 2016, 6:17 PM

llvm/trunk/lib/Transforms/IPO/FunctionImport.cpp
291	Ok I can see the logic. But I'm not convinced by the multiplier effect. i.e. right now as you get further from the original function, the threshold would increase since if I follow correctly we multiply by 3 here and by 0.7 later? I may miss something here. i.e. with a sequence of A->B->C->D (all hot) the threshold evolves this way: call to B: 100 call to C: 10030.7 = 210 call to D: 21030.7 = 441 I wonder if a Bonus wouldn't be more appropriate.

tejohnson added inline comments.Sep 28 2016, 6:21 AM

llvm/trunk/lib/Transforms/IPO/FunctionImport.cpp
291	Ok I can see the logic. But I'm not convinced by the multiplier effect. i.e. right now as you get further from the original function, the threshold would increase since if I follow correctly we multiply by 3 here and by 0.7 later? I may miss something here. i.e. with a sequence of A->B->C->D (all hot) the threshold evolves this way: call to B: 100 call to C: 10030.7 = 210 call to D: 21030.7 = 441 I wonder if a Bonus wouldn't be more appropriate. This isn't the case since (original) Threshold, not NewThreshold, is pushed into the Worklist. So the next level of callees again start with the original threshold, which is then decayed before being passed in here and multiplied. So in your example above, unless I am missing something, we get: call to B: 1003 = 300 call to C: 1000.73 = 210 call to D: 1000.70.73 = 147

mehdi_amini added inline comments.Sep 28 2016, 8:17 AM

llvm/trunk/lib/Transforms/IPO/FunctionImport.cpp
291	OK, you're right, I got confused by the name `NewThreshold`. I still wonder if we shouldn't push another threshold with bonus on the stack. And in fast this kind of what D24976 attempt to do.

mehdi_amini mentioned this in D24976: [thinlto] Don't decay threshold for hot callsites.Sep 28 2016, 8:19 AM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Analysis/

ModuleSummaryAnalysis.h

5 lines

Bitcode/

LLVMBitCodes.h

8 lines

IR/

ModuleSummaryIndex.h

20 lines

lib/

Analysis/

ModuleSummaryAnalysis.cpp

54 lines

Bitcode/

Reader/

BitcodeReader.cpp

57 lines

Writer/

BitcodeWriter.cpp

20 lines

LTO/

ThinLTOCodeGenerator.cpp

4 lines

Transforms/

IPO/

FunctionImport.cpp

15 lines

test/

Bitcode/

Inputs/

thinlto-function-summary-callgraph-combined.1.bc

thinlto-function-summary-callgraph-pgo-combined.1.bc

thinlto-function-summary-callgraph-pgo.1.bc

thinlto-function-summary-callgraph-profile-summary.ll

27 lines

thinlto-function-summary-callgraph.1.bc

summary_version.ll

2 lines

thinlto-alias.ll

4 lines

thinlto-function-summary-callgraph-pgo.ll

17 lines

thinlto-function-summary-callgraph-profile-summary.ll

98 lines

thinlto-function-summary-callgraph.ll

12 lines

thinlto-function-summary-refgraph.ll

10 lines

Transforms/

FunctionImport/

Inputs/

hotness_based_import.ll

43 lines

hotness_based_import.ll

106 lines

Diff 72558

llvm/trunk/include/llvm/Analysis/ModuleSummaryAnalysis.h

	Show All 15 Lines

	#include "llvm/ADT/STLExtras.h"			#include "llvm/ADT/STLExtras.h"
	#include "llvm/IR/ModuleSummaryIndex.h"			#include "llvm/IR/ModuleSummaryIndex.h"
	#include "llvm/IR/PassManager.h"			#include "llvm/IR/PassManager.h"
	#include "llvm/Pass.h"			#include "llvm/Pass.h"

	namespace llvm {			namespace llvm {
	class BlockFrequencyInfo;			class BlockFrequencyInfo;
				class ProfileSummaryInfo;

	/// Direct function to compute a \c ModuleSummaryIndex from a given module.			/// Direct function to compute a \c ModuleSummaryIndex from a given module.
	///			///
	/// If operating within a pass manager which has defined ways to compute the \c			/// If operating within a pass manager which has defined ways to compute the \c
	/// BlockFrequencyInfo for a given function, that can be provided via			/// BlockFrequencyInfo for a given function, that can be provided via
	/// a std::function callback. Otherwise, this routine will manually construct			/// a std::function callback. Otherwise, this routine will manually construct
	/// that information.			/// that information.
	ModuleSummaryIndex buildModuleSummaryIndex(			ModuleSummaryIndex buildModuleSummaryIndex(
	const Module &M,			const Module &M,
	std::function<BlockFrequencyInfo *(const Function &F)> GetBFICallback =			std::function<BlockFrequencyInfo *(const Function &F)> GetBFICallback,
	nullptr);			ProfileSummaryInfo *PSI);

	/// Analysis pass to provide the ModuleSummaryIndex object.			/// Analysis pass to provide the ModuleSummaryIndex object.
	class ModuleSummaryIndexAnalysis			class ModuleSummaryIndexAnalysis
	: public AnalysisInfoMixin<ModuleSummaryIndexAnalysis> {			: public AnalysisInfoMixin<ModuleSummaryIndexAnalysis> {
	friend AnalysisInfoMixin<ModuleSummaryIndexAnalysis>;			friend AnalysisInfoMixin<ModuleSummaryIndexAnalysis>;
	static char PassID;			static char PassID;

	public:			public:
	Show All 37 Lines

llvm/trunk/include/llvm/Bitcode/LLVMBitCodes.h

Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	enum ModulePathSymtabCodes {
MST_CODE_ENTRY = 1, // MST_ENTRY: [modid, namechar x N]		MST_CODE_ENTRY = 1, // MST_ENTRY: [modid, namechar x N]
MST_CODE_HASH = 2, // MST_HASH: [5*i32]		MST_CODE_HASH = 2, // MST_HASH: [5*i32]
};		};

// The summary section uses different codes in the per-module		// The summary section uses different codes in the per-module
// and combined index cases.		// and combined index cases.
enum GlobalValueSummarySymtabCodes {		enum GlobalValueSummarySymtabCodes {
// PERMODULE: [valueid, flags, instcount, numrefs, numrefs x valueid,		// PERMODULE: [valueid, flags, instcount, numrefs, numrefs x valueid,
// n x (valueid, callsitecount)]		// n x (valueid)]
FS_PERMODULE = 1,		FS_PERMODULE = 1,
// PERMODULE_PROFILE: [valueid, flags, instcount, numrefs,		// PERMODULE_PROFILE: [valueid, flags, instcount, numrefs,
// numrefs x valueid,		// numrefs x valueid,
// n x (valueid, callsitecount, profilecount)]		// n x (valueid, hotness)]
FS_PERMODULE_PROFILE = 2,		FS_PERMODULE_PROFILE = 2,
// PERMODULE_GLOBALVAR_INIT_REFS: [valueid, flags, n x valueid]		// PERMODULE_GLOBALVAR_INIT_REFS: [valueid, flags, n x valueid]
FS_PERMODULE_GLOBALVAR_INIT_REFS = 3,		FS_PERMODULE_GLOBALVAR_INIT_REFS = 3,
// COMBINED: [valueid, modid, flags, instcount, numrefs, numrefs x valueid,		// COMBINED: [valueid, modid, flags, instcount, numrefs, numrefs x valueid,
// n x (valueid, callsitecount)]		// n x (valueid)]
FS_COMBINED = 4,		FS_COMBINED = 4,
// COMBINED_PROFILE: [valueid, modid, flags, instcount, numrefs,		// COMBINED_PROFILE: [valueid, modid, flags, instcount, numrefs,
// numrefs x valueid,		// numrefs x valueid,
// n x (valueid, callsitecount, profilecount)]		// n x (valueid, hotness)]
FS_COMBINED_PROFILE = 5,		FS_COMBINED_PROFILE = 5,
// COMBINED_GLOBALVAR_INIT_REFS: [valueid, modid, flags, n x valueid]		// COMBINED_GLOBALVAR_INIT_REFS: [valueid, modid, flags, n x valueid]
FS_COMBINED_GLOBALVAR_INIT_REFS = 6,		FS_COMBINED_GLOBALVAR_INIT_REFS = 6,
// ALIAS: [valueid, flags, valueid]		// ALIAS: [valueid, flags, valueid]
FS_ALIAS = 7,		FS_ALIAS = 7,
// COMBINED_ALIAS: [valueid, modid, flags, valueid]		// COMBINED_ALIAS: [valueid, modid, flags, valueid]
FS_COMBINED_ALIAS = 8,		FS_COMBINED_ALIAS = 8,
// COMBINED_ORIGINAL_NAME: [original_name_hash]		// COMBINED_ORIGINAL_NAME: [original_name_hash]
▲ Show 20 Lines • Show All 321 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/IR/ModuleSummaryIndex.h

	Show All 24 Lines
	#include "llvm/IR/Module.h"			#include "llvm/IR/Module.h"

	#include <array>			#include <array>

	namespace llvm {			namespace llvm {

	/// \brief Class to accumulate and hold information about a callee.			/// \brief Class to accumulate and hold information about a callee.
	struct CalleeInfo {			struct CalleeInfo {
	/// The static number of callsites calling corresponding function.			enum class HotnessType : uint8_t { Unknown = 0, Cold = 1, None = 2, Hot = 3 };
	unsigned CallsiteCount;			HotnessType Hotness = HotnessType::Unknown;
	/// The cumulative profile count of calls to corresponding function
	/// (if using PGO, otherwise 0).			CalleeInfo() = default;
	uint64_t ProfileCount;			explicit CalleeInfo(HotnessType Hotness) : Hotness(Hotness) {}
	CalleeInfo() : CallsiteCount(0), ProfileCount(0) {}
	CalleeInfo(unsigned CallsiteCount, uint64_t ProfileCount)			void updateHotness(const HotnessType OtherHotness) {
	: CallsiteCount(CallsiteCount), ProfileCount(ProfileCount) {}			Hotness = std::max(Hotness, OtherHotness);
	CalleeInfo &operator+=(uint64_t RHSProfileCount) {
	CallsiteCount++;
	ProfileCount += RHSProfileCount;
	return *this;
	}			}
	};			};

	/// Struct to hold value either by GUID or Value*, depending on whether this			/// Struct to hold value either by GUID or Value*, depending on whether this
	/// is a combined or per-module index, respectively.			/// is a combined or per-module index, respectively.
	struct ValueInfo {			struct ValueInfo {
	/// The value representation used in this instance.			/// The value representation used in this instance.
	enum ValueInfoKind {			enum ValueInfoKind {
	▲ Show 20 Lines • Show All 473 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/ModuleSummaryAnalysis.cpp

Show All 12 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Analysis/ModuleSummaryAnalysis.h"		#include "llvm/Analysis/ModuleSummaryAnalysis.h"
#include "llvm/Analysis/BlockFrequencyInfo.h"		#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/Analysis/BlockFrequencyInfoImpl.h"		#include "llvm/Analysis/BlockFrequencyInfoImpl.h"
#include "llvm/Analysis/BranchProbabilityInfo.h"		#include "llvm/Analysis/BranchProbabilityInfo.h"
#include "llvm/Analysis/IndirectCallPromotionAnalysis.h"		#include "llvm/Analysis/IndirectCallPromotionAnalysis.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
		#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/ValueSymbolTable.h"		#include "llvm/IR/ValueSymbolTable.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
using namespace llvm;		using namespace llvm;

Show All 29 Lines	for (const auto &OI : U->operands()) {
RefEdges.insert(Operand);		RefEdges.insert(Operand);
continue;		continue;
}		}
Worklist.push_back(Operand);		Worklist.push_back(Operand);
}		}
}		}
}		}

		static CalleeInfo::HotnessType getHotness(uint64_t ProfileCount,
		ProfileSummaryInfo *PSI) {
		if (!PSI)
		return CalleeInfo::HotnessType::Unknown;
		if (PSI->isHotCount(ProfileCount))
		return CalleeInfo::HotnessType::Hot;
		if (PSI->isColdCount(ProfileCount))
		return CalleeInfo::HotnessType::Cold;
		return CalleeInfo::HotnessType::None;
		}

static void computeFunctionSummary(ModuleSummaryIndex &Index, const Module &M,		static void computeFunctionSummary(ModuleSummaryIndex &Index, const Module &M,
const Function &F, BlockFrequencyInfo *BFI) {		const Function &F, BlockFrequencyInfo *BFI,
		ProfileSummaryInfo *PSI) {
// Summary not currently supported for anonymous functions, they must		// Summary not currently supported for anonymous functions, they must
// be renamed.		// be renamed.
if (!F.hasName())		if (!F.hasName())
return;		return;

unsigned NumInsts = 0;		unsigned NumInsts = 0;
// Map from callee ValueId to profile count. Used to accumulate profile		// Map from callee ValueId to profile count. Used to accumulate profile
// counts for all static calls to a given callee.		// counts for all static calls to a given callee.
Show All 16 Lines	for (const Instruction &I : BB) {
// Check if this is a direct call to a known function.		// Check if this is a direct call to a known function.
if (CalledFunction) {		if (CalledFunction) {
// Skip nameless and intrinsics.		// Skip nameless and intrinsics.
if (!CalledFunction->hasName() \|\| CalledFunction->isIntrinsic())		if (!CalledFunction->hasName() \|\| CalledFunction->isIntrinsic())
continue;		continue;
auto ScaledCount = BFI ? BFI->getBlockProfileCount(&BB) : None;		auto ScaledCount = BFI ? BFI->getBlockProfileCount(&BB) : None;
auto *CalleeId =		auto *CalleeId =
M.getValueSymbolTable().lookup(CalledFunction->getName());		M.getValueSymbolTable().lookup(CalledFunction->getName());
CallGraphEdges[CalleeId] += (ScaledCount ? ScaledCount.getValue() : 0);
		auto Hotness = ScaledCount ? getHotness(ScaledCount.getValue(), PSI)
		: CalleeInfo::HotnessType::Unknown;
		CallGraphEdges[CalleeId].updateHotness(Hotness);
} else {		} else {
const auto *CI = dyn_cast<CallInst>(&I);		const auto *CI = dyn_cast<CallInst>(&I);
// Skip inline assembly calls.		// Skip inline assembly calls.
if (CI && CI->isInlineAsm())		if (CI && CI->isInlineAsm())
continue;		continue;
// Skip direct calls.		// Skip direct calls.
if (!CS.getCalledValue() \|\| isa<Constant>(CS.getCalledValue()))		if (!CS.getCalledValue() \|\| isa<Constant>(CS.getCalledValue()))
continue;		continue;

uint32_t NumVals, NumCandidates;		uint32_t NumVals, NumCandidates;
uint64_t TotalCount;		uint64_t TotalCount;
auto CandidateProfileData =		auto CandidateProfileData =
ICallAnalysis.getPromotionCandidatesForInstruction(		ICallAnalysis.getPromotionCandidatesForInstruction(
&I, NumVals, TotalCount, NumCandidates);		&I, NumVals, TotalCount, NumCandidates);
for (auto &Candidate : CandidateProfileData)		for (auto &Candidate : CandidateProfileData)
IndirectCallEdges[Candidate.Value] += Candidate.Count;		IndirectCallEdges[Candidate.Value].updateHotness(
		getHotness(Candidate.Count, PSI));
}		}
}		}

GlobalValueSummary::GVFlags Flags(F);		GlobalValueSummary::GVFlags Flags(F);
std::unique_ptr<FunctionSummary> FuncSummary =		std::unique_ptr<FunctionSummary> FuncSummary =
llvm::make_unique<FunctionSummary>(Flags, NumInsts);		llvm::make_unique<FunctionSummary>(Flags, NumInsts);
FuncSummary->addCallGraphEdges(CallGraphEdges);		FuncSummary->addCallGraphEdges(CallGraphEdges);
FuncSummary->addCallGraphEdges(IndirectCallEdges);		FuncSummary->addCallGraphEdges(IndirectCallEdges);
Show All 10 Lines	static void computeVariableSummary(ModuleSummaryIndex &Index,
std::unique_ptr<GlobalVarSummary> GVarSummary =		std::unique_ptr<GlobalVarSummary> GVarSummary =
llvm::make_unique<GlobalVarSummary>(Flags);		llvm::make_unique<GlobalVarSummary>(Flags);
GVarSummary->addRefEdges(RefEdges);		GVarSummary->addRefEdges(RefEdges);
Index.addGlobalValueSummary(V.getName(), std::move(GVarSummary));		Index.addGlobalValueSummary(V.getName(), std::move(GVarSummary));
}		}

ModuleSummaryIndex llvm::buildModuleSummaryIndex(		ModuleSummaryIndex llvm::buildModuleSummaryIndex(
const Module &M,		const Module &M,
std::function<BlockFrequencyInfo *(const Function &F)> GetBFICallback) {		std::function<BlockFrequencyInfo *(const Function &F)> GetBFICallback,
		ProfileSummaryInfo *PSI) {
ModuleSummaryIndex Index;		ModuleSummaryIndex Index;
// Check if the module can be promoted, otherwise just disable importing from		// Check if the module can be promoted, otherwise just disable importing from
// it by not emitting any summary.		// it by not emitting any summary.
// FIXME: we could still import into it most of the time.		// FIXME: we could still import into it most of the time.
if (!moduleCanBeRenamedForThinLTO(M))		if (!moduleCanBeRenamedForThinLTO(M))
return Index;		return Index;

// Compute summaries for all functions defined in module, and save in the		// Compute summaries for all functions defined in module, and save in the
// index.		// index.
for (auto &F : M) {		for (auto &F : M) {
if (F.isDeclaration())		if (F.isDeclaration())
continue;		continue;

BlockFrequencyInfo *BFI = nullptr;		BlockFrequencyInfo *BFI = nullptr;
std::unique_ptr<BlockFrequencyInfo> BFIPtr;		std::unique_ptr<BlockFrequencyInfo> BFIPtr;
if (GetBFICallback)		if (GetBFICallback)
BFI = GetBFICallback(F);		BFI = GetBFICallback(F);
else if (F.getEntryCount().hasValue()) {		else if (F.getEntryCount().hasValue()) {
LoopInfo LI{DominatorTree(const_cast<Function &>(F))};		LoopInfo LI{DominatorTree(const_cast<Function &>(F))};
BranchProbabilityInfo BPI{F, LI};		BranchProbabilityInfo BPI{F, LI};
BFIPtr = llvm::make_unique<BlockFrequencyInfo>(F, BPI, LI);		BFIPtr = llvm::make_unique<BlockFrequencyInfo>(F, BPI, LI);
BFI = BFIPtr.get();		BFI = BFIPtr.get();
}		}

computeFunctionSummary(Index, M, F, BFI);		computeFunctionSummary(Index, M, F, BFI, PSI);
}		}

// Compute summaries for all variables defined in module, and save in the		// Compute summaries for all variables defined in module, and save in the
// index.		// index.
for (const GlobalVariable &G : M.globals()) {		for (const GlobalVariable &G : M.globals()) {
if (G.isDeclaration())		if (G.isDeclaration())
continue;		continue;
computeVariableSummary(Index, G);		computeVariableSummary(Index, G);
}		}
return Index;		return Index;
}		}

char ModuleSummaryIndexAnalysis::PassID;		char ModuleSummaryIndexAnalysis::PassID;

ModuleSummaryIndex		ModuleSummaryIndex
ModuleSummaryIndexAnalysis::run(Module &M, ModuleAnalysisManager &AM) {		ModuleSummaryIndexAnalysis::run(Module &M, ModuleAnalysisManager &AM) {
		ProfileSummaryInfo &PSI = AM.getResult<ProfileSummaryAnalysis>(M);
auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();		auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
return buildModuleSummaryIndex(M, [&FAM](const Function &F) {		return buildModuleSummaryIndex(
return &FAM.getResult<BlockFrequencyAnalysis>(const_cast<Function >(&F));		M,
});		[&FAM](const Function &F) {
		return &FAM.getResult<BlockFrequencyAnalysis>(
		const_cast<Function >(&F));
		},
		&PSI);
}		}

char ModuleSummaryIndexWrapperPass::ID = 0;		char ModuleSummaryIndexWrapperPass::ID = 0;
INITIALIZE_PASS_BEGIN(ModuleSummaryIndexWrapperPass, "module-summary-analysis",		INITIALIZE_PASS_BEGIN(ModuleSummaryIndexWrapperPass, "module-summary-analysis",
"Module Summary Analysis", false, true)		"Module Summary Analysis", false, true)
INITIALIZE_PASS_DEPENDENCY(BlockFrequencyInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(BlockFrequencyInfoWrapperPass)
INITIALIZE_PASS_END(ModuleSummaryIndexWrapperPass, "module-summary-analysis",		INITIALIZE_PASS_END(ModuleSummaryIndexWrapperPass, "module-summary-analysis",
"Module Summary Analysis", false, true)		"Module Summary Analysis", false, true)

ModulePass *llvm::createModuleSummaryIndexWrapperPass() {		ModulePass *llvm::createModuleSummaryIndexWrapperPass() {
return new ModuleSummaryIndexWrapperPass();		return new ModuleSummaryIndexWrapperPass();
}		}

ModuleSummaryIndexWrapperPass::ModuleSummaryIndexWrapperPass()		ModuleSummaryIndexWrapperPass::ModuleSummaryIndexWrapperPass()
: ModulePass(ID) {		: ModulePass(ID) {
initializeModuleSummaryIndexWrapperPassPass(*PassRegistry::getPassRegistry());		initializeModuleSummaryIndexWrapperPassPass(*PassRegistry::getPassRegistry());
}		}

bool ModuleSummaryIndexWrapperPass::runOnModule(Module &M) {		bool ModuleSummaryIndexWrapperPass::runOnModule(Module &M) {
Index = buildModuleSummaryIndex(M, [this](const Function &F) {		auto &PSI = *getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI(M);
		Index = buildModuleSummaryIndex(
		M,
		[this](const Function &F) {
return &(this->getAnalysis<BlockFrequencyInfoWrapperPass>(		return &(this->getAnalysis<BlockFrequencyInfoWrapperPass>(
const_cast<Function >(&F))		const_cast<Function >(&F))
.getBFI());		.getBFI());
});		},
		&PSI);
return false;		return false;
}		}

bool ModuleSummaryIndexWrapperPass::doFinalization(Module &M) {		bool ModuleSummaryIndexWrapperPass::doFinalization(Module &M) {
Index.reset();		Index.reset();
return false;		return false;
}		}

void ModuleSummaryIndexWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {		void ModuleSummaryIndexWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {
AU.setPreservesAll();		AU.setPreservesAll();
AU.addRequired<BlockFrequencyInfoWrapperPass>();		AU.addRequired<BlockFrequencyInfoWrapperPass>();
		AU.addRequired<ProfileSummaryInfoWrapperPass>();
}		}

bool llvm::moduleCanBeRenamedForThinLTO(const Module &M) {		bool llvm::moduleCanBeRenamedForThinLTO(const Module &M) {
// We cannot currently promote or rename anything used in inline assembly,		// We cannot currently promote or rename anything used in inline assembly,
// which are not visible to the compiler. Detect a possible case by looking		// which are not visible to the compiler. Detect a possible case by looking
// for a llvm.used local value, in conjunction with an inline assembly call		// for a llvm.used local value, in conjunction with an inline assembly call
// in the module. Prevent importing of any modules containing these uses by		// in the module. Prevent importing of any modules containing these uses by
// suppressing generation of the index. This also prevents importing		// suppressing generation of the index. This also prevents importing
Show All 31 Lines

llvm/trunk/lib/Bitcode/Reader/BitcodeReader.cpp

Show First 20 Lines • Show All 645 Lines • ▼ Show 20 Lines	private:
std::error_code parseValueSymbolTable(		std::error_code parseValueSymbolTable(
uint64_t Offset,		uint64_t Offset,
DenseMap<unsigned, GlobalValue::LinkageTypes> &ValueIdToLinkageMap);		DenseMap<unsigned, GlobalValue::LinkageTypes> &ValueIdToLinkageMap);
std::error_code parseEntireSummary();		std::error_code parseEntireSummary();
std::error_code parseModuleStringTable();		std::error_code parseModuleStringTable();
std::pair<GlobalValue::GUID, GlobalValue::GUID>		std::pair<GlobalValue::GUID, GlobalValue::GUID>

getGUIDFromValueId(unsigned ValueId);		getGUIDFromValueId(unsigned ValueId);
		std::pair<GlobalValue::GUID, CalleeInfo::HotnessType>
		readCallGraphEdge(const SmallVector<uint64_t, 64> &Record, unsigned int &I,
		bool IsOldProfileFormat, bool HasProfile);
};		};

} // end anonymous namespace		} // end anonymous namespace

BitcodeDiagnosticInfo::BitcodeDiagnosticInfo(std::error_code EC,		BitcodeDiagnosticInfo::BitcodeDiagnosticInfo(std::error_code EC,
DiagnosticSeverity Severity,		DiagnosticSeverity Severity,
const Twine &Msg)		const Twine &Msg)
: DiagnosticInfo(DK_Bitcode, Severity), Msg(Msg), EC(EC) {}		: DiagnosticInfo(DK_Bitcode, Severity), Msg(Msg), EC(EC) {}
▲ Show 20 Lines • Show All 5,551 Lines • ▼ Show 20 Lines	std::error_code ModuleSummaryIndexBitcodeReader::parseEntireSummary() {
{		{
BitstreamEntry Entry = Stream.advanceSkippingSubblocks();		BitstreamEntry Entry = Stream.advanceSkippingSubblocks();
if (Entry.Kind != BitstreamEntry::Record)		if (Entry.Kind != BitstreamEntry::Record)
return error("Invalid Summary Block: record for version expected");		return error("Invalid Summary Block: record for version expected");
if (Stream.readRecord(Entry.ID, Record) != bitc::FS_VERSION)		if (Stream.readRecord(Entry.ID, Record) != bitc::FS_VERSION)
return error("Invalid Summary Block: version expected");		return error("Invalid Summary Block: version expected");
}		}
const uint64_t Version = Record[0];		const uint64_t Version = Record[0];
if (Version != 1)		const bool IsOldProfileFormat = Version == 1;
return error("Invalid summary version " + Twine(Version) + ", 1 expected");		if (!IsOldProfileFormat && Version != 2)
		return error("Invalid summary version " + Twine(Version) +
		", 1 or 2 expected");
Record.clear();		Record.clear();

// Keep around the last seen summary to be used when we see an optional		// Keep around the last seen summary to be used when we see an optional
// "OriginalName" attachement.		// "OriginalName" attachement.
GlobalValueSummary *LastSeenSummary = nullptr;		GlobalValueSummary *LastSeenSummary = nullptr;
bool Combined = false;		bool Combined = false;

while (true) {		while (true) {
Show All 28 Lines	while (true) {
// in the combined index VST entries). The records also contain		// in the combined index VST entries). The records also contain
// information used for ThinLTO renaming and importing.		// information used for ThinLTO renaming and importing.
Record.clear();		Record.clear();
auto BitCode = Stream.readRecord(Entry.ID, Record);		auto BitCode = Stream.readRecord(Entry.ID, Record);
switch (BitCode) {		switch (BitCode) {
default: // Default behavior: ignore.		default: // Default behavior: ignore.
break;		break;
// FS_PERMODULE: [valueid, flags, instcount, numrefs, numrefs x valueid,		// FS_PERMODULE: [valueid, flags, instcount, numrefs, numrefs x valueid,
// n x (valueid, callsitecount)]		// n x (valueid)]
// FS_PERMODULE_PROFILE: [valueid, flags, instcount, numrefs,		// FS_PERMODULE_PROFILE: [valueid, flags, instcount, numrefs,
// numrefs x valueid,		// numrefs x valueid,
// n x (valueid, callsitecount, profilecount)]		// n x (valueid, hotness)]
case bitc::FS_PERMODULE:		case bitc::FS_PERMODULE:
case bitc::FS_PERMODULE_PROFILE: {		case bitc::FS_PERMODULE_PROFILE: {
unsigned ValueID = Record[0];		unsigned ValueID = Record[0];
uint64_t RawFlags = Record[1];		uint64_t RawFlags = Record[1];
unsigned InstCount = Record[2];		unsigned InstCount = Record[2];
unsigned NumRefs = Record[3];		unsigned NumRefs = Record[3];
auto Flags = getDecodedGVSummaryFlags(RawFlags, Version);		auto Flags = getDecodedGVSummaryFlags(RawFlags, Version);
std::unique_ptr<FunctionSummary> FS =		std::unique_ptr<FunctionSummary> FS =
Show All 12 Lines	case bitc::FS_PERMODULE_PROFILE: {
for (unsigned I = 4, E = CallGraphEdgeStartIndex; I != E; ++I) {		for (unsigned I = 4, E = CallGraphEdgeStartIndex; I != E; ++I) {
unsigned RefValueId = Record[I];		unsigned RefValueId = Record[I];
GlobalValue::GUID RefGUID = getGUIDFromValueId(RefValueId).first;		GlobalValue::GUID RefGUID = getGUIDFromValueId(RefValueId).first;
FS->addRefEdge(RefGUID);		FS->addRefEdge(RefGUID);
}		}
bool HasProfile = (BitCode == bitc::FS_PERMODULE_PROFILE);		bool HasProfile = (BitCode == bitc::FS_PERMODULE_PROFILE);
for (unsigned I = CallGraphEdgeStartIndex, E = Record.size(); I != E;		for (unsigned I = CallGraphEdgeStartIndex, E = Record.size(); I != E;
++I) {		++I) {
unsigned CalleeValueId = Record[I];		CalleeInfo::HotnessType Hotness;
unsigned CallsiteCount = Record[++I];		GlobalValue::GUID CalleeGUID;
uint64_t ProfileCount = HasProfile ? Record[++I] : 0;		std::tie(CalleeGUID, Hotness) =
GlobalValue::GUID CalleeGUID = getGUIDFromValueId(CalleeValueId).first;		readCallGraphEdge(Record, I, IsOldProfileFormat, HasProfile);
FS->addCallGraphEdge(CalleeGUID,		FS->addCallGraphEdge(CalleeGUID, CalleeInfo(Hotness));
CalleeInfo(CallsiteCount, ProfileCount));
}		}
auto GUID = getGUIDFromValueId(ValueID);		auto GUID = getGUIDFromValueId(ValueID);
FS->setOriginalName(GUID.second);		FS->setOriginalName(GUID.second);
TheIndex->addGlobalValueSummary(GUID.first, std::move(FS));		TheIndex->addGlobalValueSummary(GUID.first, std::move(FS));
break;		break;
}		}
// FS_ALIAS: [valueid, flags, valueid]		// FS_ALIAS: [valueid, flags, valueid]
// Aliases must be emitted (and parsed) after all FS_PERMODULE entries, as		// Aliases must be emitted (and parsed) after all FS_PERMODULE entries, as
Show All 38 Lines	case bitc::FS_PERMODULE_GLOBALVAR_INIT_REFS: {
FS->addRefEdge(RefGUID);		FS->addRefEdge(RefGUID);
}		}
auto GUID = getGUIDFromValueId(ValueID);		auto GUID = getGUIDFromValueId(ValueID);
FS->setOriginalName(GUID.second);		FS->setOriginalName(GUID.second);
TheIndex->addGlobalValueSummary(GUID.first, std::move(FS));		TheIndex->addGlobalValueSummary(GUID.first, std::move(FS));
break;		break;
}		}
// FS_COMBINED: [valueid, modid, flags, instcount, numrefs,		// FS_COMBINED: [valueid, modid, flags, instcount, numrefs,
// numrefs x valueid, n x (valueid, callsitecount)]		// numrefs x valueid, n x (valueid)]
// FS_COMBINED_PROFILE: [valueid, modid, flags, instcount, numrefs,		// FS_COMBINED_PROFILE: [valueid, modid, flags, instcount, numrefs,
// numrefs x valueid,		// numrefs x valueid, n x (valueid, hotness)]
// n x (valueid, callsitecount, profilecount)]
case bitc::FS_COMBINED:		case bitc::FS_COMBINED:
case bitc::FS_COMBINED_PROFILE: {		case bitc::FS_COMBINED_PROFILE: {
unsigned ValueID = Record[0];		unsigned ValueID = Record[0];
uint64_t ModuleId = Record[1];		uint64_t ModuleId = Record[1];
uint64_t RawFlags = Record[2];		uint64_t RawFlags = Record[2];
unsigned InstCount = Record[3];		unsigned InstCount = Record[3];
unsigned NumRefs = Record[4];		unsigned NumRefs = Record[4];
auto Flags = getDecodedGVSummaryFlags(RawFlags, Version);		auto Flags = getDecodedGVSummaryFlags(RawFlags, Version);
Show All 9 Lines	case bitc::FS_COMBINED_PROFILE: {
++I) {		++I) {
unsigned RefValueId = Record[I];		unsigned RefValueId = Record[I];
GlobalValue::GUID RefGUID = getGUIDFromValueId(RefValueId).first;		GlobalValue::GUID RefGUID = getGUIDFromValueId(RefValueId).first;
FS->addRefEdge(RefGUID);		FS->addRefEdge(RefGUID);
}		}
bool HasProfile = (BitCode == bitc::FS_COMBINED_PROFILE);		bool HasProfile = (BitCode == bitc::FS_COMBINED_PROFILE);
for (unsigned I = CallGraphEdgeStartIndex, E = Record.size(); I != E;		for (unsigned I = CallGraphEdgeStartIndex, E = Record.size(); I != E;
++I) {		++I) {
unsigned CalleeValueId = Record[I];		CalleeInfo::HotnessType Hotness;
unsigned CallsiteCount = Record[++I];		GlobalValue::GUID CalleeGUID;
uint64_t ProfileCount = HasProfile ? Record[++I] : 0;		std::tie(CalleeGUID, Hotness) =
GlobalValue::GUID CalleeGUID = getGUIDFromValueId(CalleeValueId).first;		readCallGraphEdge(Record, I, IsOldProfileFormat, HasProfile);
FS->addCallGraphEdge(CalleeGUID,		FS->addCallGraphEdge(CalleeGUID, CalleeInfo(Hotness));
CalleeInfo(CallsiteCount, ProfileCount));
}		}
GlobalValue::GUID GUID = getGUIDFromValueId(ValueID).first;		GlobalValue::GUID GUID = getGUIDFromValueId(ValueID).first;
TheIndex->addGlobalValueSummary(GUID, std::move(FS));		TheIndex->addGlobalValueSummary(GUID, std::move(FS));
Combined = true;		Combined = true;
break;		break;
}		}
// FS_COMBINED_ALIAS: [valueid, modid, flags, valueid]		// FS_COMBINED_ALIAS: [valueid, modid, flags, valueid]
// Aliases must be emitted (and parsed) after all FS_COMBINED entries, as		// Aliases must be emitted (and parsed) after all FS_COMBINED entries, as
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	case bitc::FS_COMBINED_ORIGINAL_NAME: {
// Reset the LastSeenSummary		// Reset the LastSeenSummary
LastSeenSummary = nullptr;		LastSeenSummary = nullptr;
}		}
}		}
}		}
llvm_unreachable("Exit infinite loop");		llvm_unreachable("Exit infinite loop");
}		}

		std::pair<GlobalValue::GUID, CalleeInfo::HotnessType>
		ModuleSummaryIndexBitcodeReader::readCallGraphEdge(
		const SmallVector<uint64_t, 64> &Record, unsigned int &I,
		const bool IsOldProfileFormat, const bool HasProfile) {

		auto Hotness = CalleeInfo::HotnessType::Unknown;
		unsigned CalleeValueId = Record[I];
		GlobalValue::GUID CalleeGUID = getGUIDFromValueId(CalleeValueId).first;
		if (IsOldProfileFormat) {
		I += 1; // Skip old callsitecount field
		if (HasProfile)
		I += 1; // Skip old profilecount field
		} else if (HasProfile)
		Hotness = static_cast<CalleeInfo::HotnessType>(Record[++I]);
		return {CalleeGUID, Hotness};
		}

// Parse the module string table block into the Index.		// Parse the module string table block into the Index.
// This populates the ModulePathStringTable map in the index.		// This populates the ModulePathStringTable map in the index.
std::error_code ModuleSummaryIndexBitcodeReader::parseModuleStringTable() {		std::error_code ModuleSummaryIndexBitcodeReader::parseModuleStringTable() {
if (Stream.EnterSubBlock(bitc::MODULE_STRTAB_BLOCK_ID))		if (Stream.EnterSubBlock(bitc::MODULE_STRTAB_BLOCK_ID))
return error("Invalid record");		return error("Invalid record");

SmallVector<uint64_t, 64> Record;		SmallVector<uint64_t, 64> Record;

▲ Show 20 Lines • Show All 271 Lines • Show Last 20 Lines

llvm/trunk/lib/Bitcode/Writer/BitcodeWriter.cpp

Show First 20 Lines • Show All 3,287 Lines • ▼ Show 20 Lines	void ModuleBitcodeWriter::writePerModuleFunctionSummaryRecord(
std::sort(Calls.begin(), Calls.end(),		std::sort(Calls.begin(), Calls.end(),
[this](const FunctionSummary::EdgeTy &L,		[this](const FunctionSummary::EdgeTy &L,
const FunctionSummary::EdgeTy &R) {		const FunctionSummary::EdgeTy &R) {
return getValueId(L.first) < getValueId(R.first);		return getValueId(L.first) < getValueId(R.first);
});		});
bool HasProfileData = F.getEntryCount().hasValue();		bool HasProfileData = F.getEntryCount().hasValue();
for (auto &ECI : Calls) {		for (auto &ECI : Calls) {
NameVals.push_back(getValueId(ECI.first));		NameVals.push_back(getValueId(ECI.first));
assert(ECI.second.CallsiteCount > 0 && "Expected at least one callsite");
NameVals.push_back(ECI.second.CallsiteCount);
if (HasProfileData)		if (HasProfileData)
NameVals.push_back(ECI.second.ProfileCount);		NameVals.push_back(static_cast<uint8_t>(ECI.second.Hotness));
}		}

unsigned FSAbbrev = (HasProfileData ? FSCallsProfileAbbrev : FSCallsAbbrev);		unsigned FSAbbrev = (HasProfileData ? FSCallsProfileAbbrev : FSCallsAbbrev);
unsigned Code =		unsigned Code =
(HasProfileData ? bitc::FS_PERMODULE_PROFILE : bitc::FS_PERMODULE);		(HasProfileData ? bitc::FS_PERMODULE_PROFILE : bitc::FS_PERMODULE);

// Emit the finished record.		// Emit the finished record.
Stream.EmitRecord(Code, NameVals, FSAbbrev);		Stream.EmitRecord(Code, NameVals, FSAbbrev);
Show All 23 Lines	void ModuleBitcodeWriter::writeModuleLevelReferences(
Stream.EmitRecord(bitc::FS_PERMODULE_GLOBALVAR_INIT_REFS, NameVals,		Stream.EmitRecord(bitc::FS_PERMODULE_GLOBALVAR_INIT_REFS, NameVals,
FSModRefsAbbrev);		FSModRefsAbbrev);
NameVals.clear();		NameVals.clear();
}		}

// Current version for the summary.		// Current version for the summary.
// This is bumped whenever we introduce changes in the way some record are		// This is bumped whenever we introduce changes in the way some record are
// interpreted, like flags for instance.		// interpreted, like flags for instance.
static const uint64_t INDEX_VERSION = 1;		static const uint64_t INDEX_VERSION = 2;

/// Emit the per-module summary section alongside the rest of		/// Emit the per-module summary section alongside the rest of
/// the module's bitcode.		/// the module's bitcode.
void ModuleBitcodeWriter::writePerModuleGlobalValueSummary() {		void ModuleBitcodeWriter::writePerModuleGlobalValueSummary() {
Stream.EnterSubblock(bitc::GLOBALVAL_SUMMARY_BLOCK_ID, 4);		Stream.EnterSubblock(bitc::GLOBALVAL_SUMMARY_BLOCK_ID, 4);

Stream.EmitRecord(bitc::FS_VERSION, ArrayRef<uint64_t>{INDEX_VERSION});		Stream.EmitRecord(bitc::FS_VERSION, ArrayRef<uint64_t>{INDEX_VERSION});

if (Index->begin() == Index->end()) {		if (Index->begin() == Index->end()) {
Stream.ExitBlock();		Stream.ExitBlock();
return;		return;
}		}

// Abbrev for FS_PERMODULE.		// Abbrev for FS_PERMODULE.
BitCodeAbbrev *Abbv = new BitCodeAbbrev();		BitCodeAbbrev *Abbv = new BitCodeAbbrev();
Abbv->Add(BitCodeAbbrevOp(bitc::FS_PERMODULE));		Abbv->Add(BitCodeAbbrevOp(bitc::FS_PERMODULE));
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // valueid		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // valueid
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6)); // flags		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6)); // flags
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // instcount		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // instcount
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 4)); // numrefs		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 4)); // numrefs
// numrefs x valueid, n x (valueid, callsitecount)		// numrefs x valueid, n x (valueid)
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8));		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8));
unsigned FSCallsAbbrev = Stream.EmitAbbrev(Abbv);		unsigned FSCallsAbbrev = Stream.EmitAbbrev(Abbv);

// Abbrev for FS_PERMODULE_PROFILE.		// Abbrev for FS_PERMODULE_PROFILE.
Abbv = new BitCodeAbbrev();		Abbv = new BitCodeAbbrev();
Abbv->Add(BitCodeAbbrevOp(bitc::FS_PERMODULE_PROFILE));		Abbv->Add(BitCodeAbbrevOp(bitc::FS_PERMODULE_PROFILE));
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // valueid		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // valueid
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6)); // flags		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6)); // flags
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // instcount		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // instcount
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 4)); // numrefs		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 4)); // numrefs
// numrefs x valueid, n x (valueid, callsitecount, profilecount)		// numrefs x valueid, n x (valueid, hotness)
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8));		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8));
unsigned FSCallsProfileAbbrev = Stream.EmitAbbrev(Abbv);		unsigned FSCallsProfileAbbrev = Stream.EmitAbbrev(Abbv);

// Abbrev for FS_PERMODULE_GLOBALVAR_INIT_REFS.		// Abbrev for FS_PERMODULE_GLOBALVAR_INIT_REFS.
Abbv = new BitCodeAbbrev();		Abbv = new BitCodeAbbrev();
Abbv->Add(BitCodeAbbrevOp(bitc::FS_PERMODULE_GLOBALVAR_INIT_REFS));		Abbv->Add(BitCodeAbbrevOp(bitc::FS_PERMODULE_GLOBALVAR_INIT_REFS));
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // valueid		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // valueid
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	void IndexBitcodeWriter::writeCombinedGlobalValueSummary() {
// Abbrev for FS_COMBINED.		// Abbrev for FS_COMBINED.
BitCodeAbbrev *Abbv = new BitCodeAbbrev();		BitCodeAbbrev *Abbv = new BitCodeAbbrev();
Abbv->Add(BitCodeAbbrevOp(bitc::FS_COMBINED));		Abbv->Add(BitCodeAbbrevOp(bitc::FS_COMBINED));
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // valueid		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // valueid
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // modid		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // modid
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6)); // flags		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6)); // flags
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // instcount		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // instcount
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 4)); // numrefs		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 4)); // numrefs
// numrefs x valueid, n x (valueid, callsitecount)		// numrefs x valueid, n x (valueid)
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8));		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8));
unsigned FSCallsAbbrev = Stream.EmitAbbrev(Abbv);		unsigned FSCallsAbbrev = Stream.EmitAbbrev(Abbv);

// Abbrev for FS_COMBINED_PROFILE.		// Abbrev for FS_COMBINED_PROFILE.
Abbv = new BitCodeAbbrev();		Abbv = new BitCodeAbbrev();
Abbv->Add(BitCodeAbbrevOp(bitc::FS_COMBINED_PROFILE));		Abbv->Add(BitCodeAbbrevOp(bitc::FS_COMBINED_PROFILE));
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // valueid		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // valueid
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // modid		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // modid
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6)); // flags		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6)); // flags
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // instcount		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // instcount
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 4)); // numrefs		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 4)); // numrefs
// numrefs x valueid, n x (valueid, callsitecount, profilecount)		// numrefs x valueid, n x (valueid, hotness)
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8));		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8));
unsigned FSCallsProfileAbbrev = Stream.EmitAbbrev(Abbv);		unsigned FSCallsProfileAbbrev = Stream.EmitAbbrev(Abbv);

// Abbrev for FS_COMBINED_GLOBALVAR_INIT_REFS.		// Abbrev for FS_COMBINED_GLOBALVAR_INIT_REFS.
Abbv = new BitCodeAbbrev();		Abbv = new BitCodeAbbrev();
Abbv->Add(BitCodeAbbrevOp(bitc::FS_COMBINED_GLOBALVAR_INIT_REFS));		Abbv->Add(BitCodeAbbrevOp(bitc::FS_COMBINED_GLOBALVAR_INIT_REFS));
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // valueid		Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // valueid
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	for (const auto &I : *this) {
NameVals.push_back(FS->refs().size());		NameVals.push_back(FS->refs().size());

for (auto &RI : FS->refs()) {		for (auto &RI : FS->refs()) {
NameVals.push_back(getValueId(RI.getGUID()));		NameVals.push_back(getValueId(RI.getGUID()));
}		}

bool HasProfileData = false;		bool HasProfileData = false;
for (auto &EI : FS->calls()) {		for (auto &EI : FS->calls()) {
HasProfileData \|= EI.second.ProfileCount != 0;		HasProfileData \|= EI.second.Hotness != CalleeInfo::HotnessType::Unknown;
if (HasProfileData)		if (HasProfileData)
break;		break;
}		}

for (auto &EI : FS->calls()) {		for (auto &EI : FS->calls()) {
// If this GUID doesn't have a value id, it doesn't have a function		// If this GUID doesn't have a value id, it doesn't have a function
// summary and we don't need to record any calls to it.		// summary and we don't need to record any calls to it.
if (!hasValueId(EI.first.getGUID()))		if (!hasValueId(EI.first.getGUID()))
continue;		continue;
NameVals.push_back(getValueId(EI.first.getGUID()));		NameVals.push_back(getValueId(EI.first.getGUID()));
assert(EI.second.CallsiteCount > 0 && "Expected at least one callsite");
NameVals.push_back(EI.second.CallsiteCount);
if (HasProfileData)		if (HasProfileData)
NameVals.push_back(EI.second.ProfileCount);		NameVals.push_back(static_cast<uint8_t>(EI.second.Hotness));
}		}

unsigned FSAbbrev = (HasProfileData ? FSCallsProfileAbbrev : FSCallsAbbrev);		unsigned FSAbbrev = (HasProfileData ? FSCallsProfileAbbrev : FSCallsAbbrev);
unsigned Code =		unsigned Code =
(HasProfileData ? bitc::FS_COMBINED_PROFILE : bitc::FS_COMBINED);		(HasProfileData ? bitc::FS_COMBINED_PROFILE : bitc::FS_COMBINED);

// Emit the finished record.		// Emit the finished record.
Stream.EmitRecord(Code, NameVals, FSAbbrev);		Stream.EmitRecord(Code, NameVals, FSAbbrev);
▲ Show 20 Lines • Show All 288 Lines • Show Last 20 Lines

llvm/trunk/lib/LTO/ThinLTOCodeGenerator.cpp

Show All 15 Lines

#ifdef HAVE_LLVM_REVISION		#ifdef HAVE_LLVM_REVISION
#include "LLVMLTORevision.h"		#include "LLVMLTORevision.h"
#endif		#endif

#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/Analysis/ModuleSummaryAnalysis.h"		#include "llvm/Analysis/ModuleSummaryAnalysis.h"
		#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Bitcode/BitcodeWriterPass.h"		#include "llvm/Bitcode/BitcodeWriterPass.h"
#include "llvm/Bitcode/ReaderWriter.h"		#include "llvm/Bitcode/ReaderWriter.h"
#include "llvm/ExecutionEngine/ObjectMemoryBuffer.h"		#include "llvm/ExecutionEngine/ObjectMemoryBuffer.h"
#include "llvm/IR/DiagnosticPrinter.h"		#include "llvm/IR/DiagnosticPrinter.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
▲ Show 20 Lines • Show All 340 Lines • ▼ Show 20 Lines	ProcessThinLTOModule(Module &TheModule, ModuleSummaryIndex &Index,

saveTempBitcode(TheModule, SaveTempsDir, count, ".4.opt.bc");		saveTempBitcode(TheModule, SaveTempsDir, count, ".4.opt.bc");

if (DisableCodeGen) {		if (DisableCodeGen) {
// Configured to stop before CodeGen, serialize the bitcode and return.		// Configured to stop before CodeGen, serialize the bitcode and return.
SmallVector<char, 128> OutputBuffer;		SmallVector<char, 128> OutputBuffer;
{		{
raw_svector_ostream OS(OutputBuffer);		raw_svector_ostream OS(OutputBuffer);
auto Index = buildModuleSummaryIndex(TheModule);		ProfileSummaryInfo PSI(TheModule);
		auto Index = buildModuleSummaryIndex(TheModule, nullptr, nullptr);
WriteBitcodeToFile(&TheModule, OS, true, &Index);		WriteBitcodeToFile(&TheModule, OS, true, &Index);
}		}
return make_unique<ObjectMemoryBuffer>(std::move(OutputBuffer));		return make_unique<ObjectMemoryBuffer>(std::move(OutputBuffer));
}		}

return codegenModule(TheModule, TM);		return codegenModule(TheModule, TM);
}		}

▲ Show 20 Lines • Show All 447 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/IPO/FunctionImport.cpp

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	static cl::opt<unsigned> ImportInstrLimit(
cl::desc("Only import functions with less than N instructions"));		cl::desc("Only import functions with less than N instructions"));

static cl::opt<float>		static cl::opt<float>
ImportInstrFactor("import-instr-evolution-factor", cl::init(0.7),		ImportInstrFactor("import-instr-evolution-factor", cl::init(0.7),
cl::Hidden, cl::value_desc("x"),		cl::Hidden, cl::value_desc("x"),
cl::desc("As we import functions, multiply the "		cl::desc("As we import functions, multiply the "
"`import-instr-limit` threshold by this factor "		"`import-instr-limit` threshold by this factor "
"before processing newly imported functions"));		"before processing newly imported functions"));
		static cl::opt<float> ImportHotMultiplier(
		"import-hot-multiplier", cl::init(3.0), cl::Hidden, cl::value_desc("x"),
		cl::ZeroOrMore, cl::desc("Multiply the `import-instr-limit` threshold for "
		"hot callsites"));

static cl::opt<bool> PrintImports("print-imports", cl::init(false), cl::Hidden,		static cl::opt<bool> PrintImports("print-imports", cl::init(false), cl::Hidden,
cl::desc("Print imported functions"));		cl::desc("Print imported functions"));

// Temporary allows the function import pass to disable always linking		// Temporary allows the function import pass to disable always linking
// referenced discardable symbols.		// referenced discardable symbols.
static cl::opt<bool>		static cl::opt<bool>
DontForceImportReferencedDiscardableSymbols("disable-force-link-odr",		DontForceImportReferencedDiscardableSymbols("disable-force-link-odr",
▲ Show 20 Lines • Show All 204 Lines • ▼ Show 20 Lines

using EdgeInfo = std::pair<const FunctionSummary , unsigned / Threshold */>;		using EdgeInfo = std::pair<const FunctionSummary , unsigned / Threshold */>;

/// Compute the list of functions to import for a given caller. Mark these		/// Compute the list of functions to import for a given caller. Mark these
/// imported functions and the symbols they reference in their source module as		/// imported functions and the symbols they reference in their source module as
/// exported from their source module.		/// exported from their source module.
static void computeImportForFunction(		static void computeImportForFunction(
const FunctionSummary &Summary, const ModuleSummaryIndex &Index,		const FunctionSummary &Summary, const ModuleSummaryIndex &Index,
unsigned Threshold, const GVSummaryMapTy &DefinedGVSummaries,		const unsigned Threshold, const GVSummaryMapTy &DefinedGVSummaries,
SmallVectorImpl<EdgeInfo> &Worklist,		SmallVectorImpl<EdgeInfo> &Worklist,
FunctionImporter::ImportMapTy &ImportList,		FunctionImporter::ImportMapTy &ImportList,
StringMap<FunctionImporter::ExportSetTy> *ExportLists = nullptr) {		StringMap<FunctionImporter::ExportSetTy> *ExportLists = nullptr) {
for (auto &Edge : Summary.calls()) {		for (auto &Edge : Summary.calls()) {
auto GUID = Edge.first.getGUID();		auto GUID = Edge.first.getGUID();
DEBUG(dbgs() << " edge -> " << GUID << " Threshold:" << Threshold << "\n");		DEBUG(dbgs() << " edge -> " << GUID << " Threshold:" << Threshold << "\n");

if (DefinedGVSummaries.count(GUID)) {		if (DefinedGVSummaries.count(GUID)) {
DEBUG(dbgs() << "ignored! Target already in destination module.\n");		DEBUG(dbgs() << "ignored! Target already in destination module.\n");
continue;		continue;
}		}

auto *CalleeSummary = selectCallee(GUID, Threshold, Index);		// FIXME: Also lower the threshold for cold callsites.
		const auto NewThreshold =
		Edge.second.Hotness == CalleeInfo::HotnessType::Hot
		? Threshold * ImportHotMultiplier
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions This does not seem correct to me. The multiplier should be apply only on the initial "seed" of the call chain, not at every chain. mehdi_amini: This does not seem correct to me. The multiplier should be apply only on the initial "seed" of…
		tejohnsonUnsubmitted Not Done Reply Inline Actions The multiplier is applied to each call independently based on whether it is marked hot. E.g. if we have: A -> B (hot call) and then B has 2 calls: B -> C1 (hot call) and B -> C2 (cold call) I assume you are referring to A as the "seed" of the call chain? We want to treat the two different calls from B differently, it doesn't matter that the call from A -> B is hot. tejohnson: The multiplier is applied to each call independently based on whether it is marked hot. E.g. if…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Ok I can see the logic. But I'm not convinced by the multiplier effect. i.e. right now as you get further from the original function, the threshold would increase since if I follow correctly we multiply by 3 here and by 0.7 later? I may miss something here. i.e. with a sequence of A->B->C->D (all hot) the threshold evolves this way: call to B: 100 call to C: 10030.7 = 210 call to D: 21030.7 = 441 I wonder if a Bonus wouldn't be more appropriate. mehdi_amini: Ok I can see the logic. But I'm not convinced by the multiplier effect. i.e. right now as you…
		tejohnsonUnsubmitted Not Done Reply Inline Actions Ok I can see the logic. But I'm not convinced by the multiplier effect. i.e. right now as you get further from the original function, the threshold would increase since if I follow correctly we multiply by 3 here and by 0.7 later? I may miss something here. i.e. with a sequence of A->B->C->D (all hot) the threshold evolves this way: call to B: 100 call to C: 10030.7 = 210 call to D: 21030.7 = 441 I wonder if a Bonus wouldn't be more appropriate. This isn't the case since (original) Threshold, not NewThreshold, is pushed into the Worklist. So the next level of callees again start with the original threshold, which is then decayed before being passed in here and multiplied. So in your example above, unless I am missing something, we get: call to B: 1003 = 300 call to C: 1000.73 = 210 call to D: 1000.70.73 = 147 tejohnson: > Ok I can see the logic. But I'm not convinced by the multiplier effect. > i.e. right now as…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions OK, you're right, I got confused by the name `NewThreshold`. I still wonder if we shouldn't push another threshold with bonus on the stack. And in fast this kind of what D24976 attempt to do. mehdi_amini: OK, you're right, I got confused by the name `NewThreshold`. I still wonder if we shouldn't…
		: Threshold;
		auto *CalleeSummary = selectCallee(GUID, NewThreshold, Index);
if (!CalleeSummary) {		if (!CalleeSummary) {
DEBUG(dbgs() << "ignored! No qualifying callee with summary found.\n");		DEBUG(dbgs() << "ignored! No qualifying callee with summary found.\n");
continue;		continue;
}		}
// "Resolve" the summary, traversing alias,		// "Resolve" the summary, traversing alias,
const FunctionSummary *ResolvedCalleeSummary;		const FunctionSummary *ResolvedCalleeSummary;
if (isa<AliasSummary>(CalleeSummary)) {		if (isa<AliasSummary>(CalleeSummary)) {
ResolvedCalleeSummary = cast<FunctionSummary>(		ResolvedCalleeSummary = cast<FunctionSummary>(
&cast<AliasSummary>(CalleeSummary)->getAliasee());		&cast<AliasSummary>(CalleeSummary)->getAliasee());
assert(		assert(
GlobalValue::isLinkOnceODRLinkage(ResolvedCalleeSummary->linkage()) &&		GlobalValue::isLinkOnceODRLinkage(ResolvedCalleeSummary->linkage()) &&
"Unexpected alias to a non-linkonceODR in import list");		"Unexpected alias to a non-linkonceODR in import list");
} else		} else
ResolvedCalleeSummary = cast<FunctionSummary>(CalleeSummary);		ResolvedCalleeSummary = cast<FunctionSummary>(CalleeSummary);

assert(ResolvedCalleeSummary->instCount() <= Threshold &&		assert(ResolvedCalleeSummary->instCount() <= NewThreshold &&
"selectCallee() didn't honor the threshold");		"selectCallee() didn't honor the threshold");

auto ExportModulePath = ResolvedCalleeSummary->modulePath();		auto ExportModulePath = ResolvedCalleeSummary->modulePath();
auto &ProcessedThreshold = ImportList[ExportModulePath][GUID];		auto &ProcessedThreshold = ImportList[ExportModulePath][GUID];
/// Since the traversal of the call graph is DFS, we can revisit a function		/// Since the traversal of the call graph is DFS, we can revisit a function
/// a second time with a higher threshold. In this case, it is added back to		/// a second time with a higher threshold. In this case, it is added back to
/// the worklist with the new threshold.		/// the worklist with the new threshold.
if (ProcessedThreshold && ProcessedThreshold >= Threshold) {		if (ProcessedThreshold && ProcessedThreshold >= Threshold) {
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	for (auto &GVSummary : DefinedGVSummaries) {
auto *Summary = GVSummary.second;		auto *Summary = GVSummary.second;
if (auto *AS = dyn_cast<AliasSummary>(Summary))		if (auto *AS = dyn_cast<AliasSummary>(Summary))
Summary = &AS->getAliasee();		Summary = &AS->getAliasee();
auto *FuncSummary = dyn_cast<FunctionSummary>(Summary);		auto *FuncSummary = dyn_cast<FunctionSummary>(Summary);
if (!FuncSummary)		if (!FuncSummary)
// Skip import for global variables		// Skip import for global variables
continue;		continue;
DEBUG(dbgs() << "Initalize import for " << GVSummary.first << "\n");		DEBUG(dbgs() << "Initalize import for " << GVSummary.first << "\n");
computeImportForFunction(*FuncSummary, Index, ImportInstrLimit,		computeImportForFunction(*FuncSummary, Index, ImportInstrLimit,
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Here the `ImportInstrLimit` can benefit of the multiplier. mehdi_amini: Here the `ImportInstrLimit` can benefit of the multiplier.
		tejohnsonUnsubmitted Not Done Reply Inline Actions No since the calls from FuncSummary will be treated appropriately if they are hot. tejohnson: No since the calls from FuncSummary will be treated appropriately if they are hot.
DefinedGVSummaries, Worklist, ImportList,		DefinedGVSummaries, Worklist, ImportList,
ExportLists);		ExportLists);
}		}

while (!Worklist.empty()) {		while (!Worklist.empty()) {
auto FuncInfo = Worklist.pop_back_val();		auto FuncInfo = Worklist.pop_back_val();
auto *Summary = FuncInfo.first;		auto *Summary = FuncInfo.first;
auto Threshold = FuncInfo.second;		auto Threshold = FuncInfo.second;
▲ Show 20 Lines • Show All 444 Lines • Show Last 20 Lines

llvm/trunk/test/Bitcode/Inputs/thinlto-function-summary-callgraph-combined.1.bc

This is a binary file.

llvm/trunk/test/Bitcode/Inputs/thinlto-function-summary-callgraph-pgo-combined.1.bc

This is a binary file.

llvm/trunk/test/Bitcode/Inputs/thinlto-function-summary-callgraph-pgo.1.bc

This is a binary file.

llvm/trunk/test/Bitcode/Inputs/thinlto-function-summary-callgraph-profile-summary.ll

				; ModuleID = 'thinlto-function-summary-callgraph-profile-summary2.ll'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"


				define void @hot1() #1 {
				ret void
				}
				define void @hot2() #1 {
				ret void
				}
				define void @hot3() #1 {
				ret void
				}
				define void @cold() #1 {
				ret void
				}
				define void @none1() #1 {
				ret void
				}
				define void @none2() #1 {
				ret void
				}
				define void @none3() #1 {
				ret void
				}

llvm/trunk/test/Bitcode/Inputs/thinlto-function-summary-callgraph.1.bc

This is a binary file.

llvm/trunk/test/Bitcode/summary_version.ll

	; Check summary versioning			; Check summary versioning
	; RUN: opt -module-summary %s -o - \| llvm-bcanalyzer -dump \| FileCheck %s			; RUN: opt -module-summary %s -o - \| llvm-bcanalyzer -dump \| FileCheck %s

	; CHECK: <GLOBALVAL_SUMMARY_BLOCK			; CHECK: <GLOBALVAL_SUMMARY_BLOCK
	; CHECK: <VERSION op0=1/>			; CHECK: <VERSION op0=2/>



	; Need a function for the summary to be populated.			; Need a function for the summary to be populated.
	define void @foo() {			define void @foo() {
	ret void			ret void
	}			}

llvm/trunk/test/Bitcode/thinlto-alias.ll

	; Test to check the callgraph in summary			; Test to check the callgraph in summary
	; RUN: opt -module-summary %s -o %t.o			; RUN: opt -module-summary %s -o %t.o
	; RUN: llvm-bcanalyzer -dump %t.o \| FileCheck %s			; RUN: llvm-bcanalyzer -dump %t.o \| FileCheck %s
	; RUN: opt -module-summary %p/Inputs/thinlto-alias.ll -o %t2.o			; RUN: opt -module-summary %p/Inputs/thinlto-alias.ll -o %t2.o
	; RUN: llvm-lto -thinlto -o %t3 %t.o %t2.o			; RUN: llvm-lto -thinlto -o %t3 %t.o %t2.o
	; RUN: llvm-bcanalyzer -dump %t3.thinlto.bc \| FileCheck %s --check-prefix=COMBINED			; RUN: llvm-bcanalyzer -dump %t3.thinlto.bc \| FileCheck %s --check-prefix=COMBINED

	; CHECK: <GLOBALVAL_SUMMARY_BLOCK			; CHECK: <GLOBALVAL_SUMMARY_BLOCK
	; CHECK-NEXT: <VERSION			; CHECK-NEXT: <VERSION
	; See if the call to func is registered, using the expected callsite count			; See if the call to func is registered, using the expected callsite count
	; and value id matching the subsequent value symbol table.			; and value id matching the subsequent value symbol table.
	; CHECK-NEXT: <PERMODULE {{.*}} op4=[[FUNCID:[0-9]+]] op5=1/>			; CHECK-NEXT: <PERMODULE {{.*}} op4=[[FUNCID:[0-9]+]]/>
	; CHECK-NEXT: </GLOBALVAL_SUMMARY_BLOCK>			; CHECK-NEXT: </GLOBALVAL_SUMMARY_BLOCK>
	; CHECK-NEXT: <VALUE_SYMTAB			; CHECK-NEXT: <VALUE_SYMTAB
	; CHECK-NEXT: <FNENTRY {{.*}} record string = 'main'			; CHECK-NEXT: <FNENTRY {{.*}} record string = 'main'
	; External function analias should have entry with value id FUNCID			; External function analias should have entry with value id FUNCID
	; CHECK-NEXT: <ENTRY {{.}} op0=[[FUNCID]] {{.}} record string = 'analias'			; CHECK-NEXT: <ENTRY {{.}} op0=[[FUNCID]] {{.}} record string = 'analias'
	; CHECK-NEXT: </VALUE_SYMTAB>			; CHECK-NEXT: </VALUE_SYMTAB>

	; COMBINED: <GLOBALVAL_SUMMARY_BLOCK			; COMBINED: <GLOBALVAL_SUMMARY_BLOCK
	; COMBINED-NEXT: <VERSION			; COMBINED-NEXT: <VERSION
	; See if the call to analias is registered, using the expected callsite count			; See if the call to analias is registered, using the expected callsite count
	; and value id matching the subsequent value symbol table.			; and value id matching the subsequent value symbol table.
	; COMBINED-NEXT: <COMBINED {{.*}} op5=[[ALIASID:[0-9]+]] op6=1/>			; COMBINED-NEXT: <COMBINED {{.*}} op5=[[ALIASID:[0-9]+]]/>
	; Followed by the alias and aliasee			; Followed by the alias and aliasee
	; COMBINED-NEXT: <COMBINED {{.*}}			; COMBINED-NEXT: <COMBINED {{.*}}
	; COMBINED-NEXT: <COMBINED_ALIAS {{.*}} op3=[[ALIASEEID:[0-9]+]]			; COMBINED-NEXT: <COMBINED_ALIAS {{.*}} op3=[[ALIASEEID:[0-9]+]]
	; COMBINED-NEXT: </GLOBALVAL_SUMMARY_BLOCK			; COMBINED-NEXT: </GLOBALVAL_SUMMARY_BLOCK
	; COMBINED-NEXT: <VALUE_SYMTAB			; COMBINED-NEXT: <VALUE_SYMTAB
	; Entry for function func should have entry with value id ALIASID			; Entry for function func should have entry with value id ALIASID
	; COMBINED-NEXT: <COMBINED_ENTRY {{.*}} op0=[[ALIASID]] op1=-5751648690987223394/>			; COMBINED-NEXT: <COMBINED_ENTRY {{.*}} op0=[[ALIASID]] op1=-5751648690987223394/>
	; COMBINED-NEXT: <COMBINED			; COMBINED-NEXT: <COMBINED
	Show All 15 Lines

llvm/trunk/test/Bitcode/thinlto-function-summary-callgraph-pgo.ll

	; Test to check the callgraph in summary when there is PGO			; Test to check the callgraph in summary when there is PGO
	; RUN: opt -module-summary %s -o %t.o			; RUN: opt -module-summary %s -o %t.o
	; RUN: llvm-bcanalyzer -dump %t.o \| FileCheck %s			; RUN: llvm-bcanalyzer -dump %t.o \| FileCheck %s

	; RUN: opt -module-summary %p/Inputs/thinlto-function-summary-callgraph.ll -o %t2.o			; RUN: opt -module-summary %p/Inputs/thinlto-function-summary-callgraph.ll -o %t2.o
	; RUN: llvm-lto -thinlto -o %t3 %t.o %t2.o			; RUN: llvm-lto -thinlto -o %t3 %t.o %t2.o
	; RUN: llvm-bcanalyzer -dump %t3.thinlto.bc \| FileCheck %s --check-prefix=COMBINED			; RUN: llvm-bcanalyzer -dump %t3.thinlto.bc \| FileCheck %s --check-prefix=COMBINED

				; Check parsing for old summary versions generated from this file.
				; RUN: llvm-lto -thinlto-index-stats %p/Inputs/thinlto-function-summary-callgraph-pgo.1.bc \| FileCheck %s --check-prefix=OLD
				; RUN: llvm-lto -thinlto-index-stats %p/Inputs/thinlto-function-summary-callgraph-pgo-combined.1.bc \| FileCheck %s --check-prefix=OLD-COMBINED

	; CHECK: <GLOBALVAL_SUMMARY_BLOCK			; CHECK: <GLOBALVAL_SUMMARY_BLOCK
	; CHECK-NEXT: <VERSION			; CHECK-NEXT: <VERSION
	; See if the call to func is registered, using the expected callsite count			; See if the call to func is registered, using the expected callsite count
	; and profile count, with value id matching the subsequent value symbol table.			; and hotness type, with value id matching the subsequent value symbol table.
	; CHECK-NEXT: <PERMODULE_PROFILE {{.*}} op4=[[FUNCID:[0-9]+]] op5=1 op6=1/>			; CHECK-NEXT: <PERMODULE_PROFILE {{.*}} op4=[[FUNCID:[0-9]+]] op5=2/>
	; CHECK-NEXT: </GLOBALVAL_SUMMARY_BLOCK>			; CHECK-NEXT: </GLOBALVAL_SUMMARY_BLOCK>
	; CHECK-NEXT: <VALUE_SYMTAB			; CHECK-NEXT: <VALUE_SYMTAB
	; CHECK-NEXT: <FNENTRY {{.*}} record string = 'main'			; CHECK-NEXT: <FNENTRY {{.*}} record string = 'main'
	; External function func should have entry with value id FUNCID			; External function func should have entry with value id FUNCID
	; CHECK-NEXT: <ENTRY {{.}} op0=[[FUNCID]] {{.}} record string = 'func'			; CHECK-NEXT: <ENTRY {{.}} op0=[[FUNCID]] {{.}} record string = 'func'
	; CHECK-NEXT: </VALUE_SYMTAB>			; CHECK-NEXT: </VALUE_SYMTAB>

	; COMBINED: <GLOBALVAL_SUMMARY_BLOCK			; COMBINED: <GLOBALVAL_SUMMARY_BLOCK
	; COMBINED-NEXT: <VERSION			; COMBINED-NEXT: <VERSION
	; COMBINED-NEXT: <COMBINED			; COMBINED-NEXT: <COMBINED
	; See if the call to func is registered, using the expected callsite count			; See if the call to func is registered, using the expected callsite count
	; and profile count, with value id matching the subsequent value symbol table.			; and hotness type, with value id matching the subsequent value symbol table.
	; COMBINED-NEXT: <COMBINED_PROFILE {{.*}} op5=[[FUNCID:[0-9]+]] op6=1 op7=1/>			; op6=2 which is hotnessType::None.
				; COMBINED-NEXT: <COMBINED_PROFILE {{.*}} op5=[[FUNCID:[0-9]+]] op6=2/>
	; COMBINED-NEXT: </GLOBALVAL_SUMMARY_BLOCK>			; COMBINED-NEXT: </GLOBALVAL_SUMMARY_BLOCK>
	; COMBINED-NEXT: <VALUE_SYMTAB			; COMBINED-NEXT: <VALUE_SYMTAB
	; Entry for function func should have entry with value id FUNCID			; Entry for function func should have entry with value id FUNCID
	; COMBINED-NEXT: <COMBINED_ENTRY {{.*}} op0=[[FUNCID]] op1=7289175272376759421/>			; COMBINED-NEXT: <COMBINED_ENTRY {{.*}} op0=[[FUNCID]] op1=7289175272376759421/>
	; COMBINED-NEXT: <COMBINED			; COMBINED-NEXT: <COMBINED
	; COMBINED-NEXT: </VALUE_SYMTAB>			; COMBINED-NEXT: </VALUE_SYMTAB>

	; ModuleID = 'thinlto-function-summary-callgraph.ll'			; ModuleID = 'thinlto-function-summary-callgraph.ll'
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define i32 @main() #0 !prof !2 {			define i32 @main() #0 !prof !2 {
	entry:			entry:
	call void (...) @func()			call void (...) @func()
	ret i32 0			ret i32 0
	}			}

	declare void @func(...) #1			declare void @func(...) #1

	!2 = !{!"function_entry_count", i64 1}			!2 = !{!"function_entry_count", i64 1}

				; OLD: Index {{.*}} contains 1 nodes (1 functions, 0 alias, 0 globals) and 1 edges (0 refs and 1 calls)
				; OLD-COMBINED: Index {{.*}} contains 2 nodes (2 functions, 0 alias, 0 globals) and 1 edges (0 refs and 1 calls)

llvm/trunk/test/Bitcode/thinlto-function-summary-callgraph-profile-summary.ll

				; Test to check the callgraph in summary when there is PGO
				; RUN: opt -module-summary %s -o %t.o
				; RUN: llvm-bcanalyzer -dump %t.o \| FileCheck %s
				; RUN: opt -module-summary %p/Inputs/thinlto-function-summary-callgraph-profile-summary.ll -o %t2.o
				; RUN: llvm-lto -thinlto -o %t3 %t.o %t2.o
				; RUN: llvm-bcanalyzer -dump %t3.thinlto.bc \| FileCheck %s --check-prefix=COMBINED


				; CHECK-LABEL: <GLOBALVAL_SUMMARY_BLOCK
				; CHECK-NEXT: <VERSION
				; See if the call to func is registered, using the expected callsite count
				; and profile count, with value id matching the subsequent value symbol table.
				; CHECK-NEXT: <PERMODULE_PROFILE {{.}} op4=[[HOT1:.]] op5=3 op6=[[HOT2:.]] op7=3 op8=[[HOT3:.]] op9=3 op10=[[COLD:.]] op11=1 op12=[[NONE1:.]] op13=2 op14=[[NONE2:.]] op15=2 op16=[[NONE3:.]] op17=2/>
				; CHECK-NEXT: </GLOBALVAL_SUMMARY_BLOCK>
				; CHECK-LABEL: <VALUE_SYMTAB
				; CHECK-NEXT: <FNENTRY {{.*}} record string = 'hot_function
				; CHECK-DAG: <ENTRY abbrevid=6 op0=[[NONE1]] {{.*}} record string = 'none1'
				; CHECK-DAG: <ENTRY abbrevid=6 op0=[[COLD]] {{.*}} record string = 'cold'
				; CHECK-DAG: <ENTRY abbrevid=6 op0=[[NONE2]] {{.*}} record string = 'none2'
				; CHECK-DAG: <ENTRY abbrevid=6 op0=[[NONE3]] {{.*}} record string = 'none3'
				; CHECK-DAG: <ENTRY abbrevid=6 op0=[[HOT1]] {{.*}} record string = 'hot1'
				; CHECK-DAG: <ENTRY abbrevid=6 op0=[[HOT2]] {{.*}} record string = 'hot2'
				; CHECK-DAG: <ENTRY abbrevid=6 op0=[[HOT3]] {{.*}} record string = 'hot3'
				; CHECK-LABEL: </VALUE_SYMTAB>

				; COMBINED: <GLOBALVAL_SUMMARY_BLOCK
				; COMBINED-NEXT: <VERSION
				; COMBINED-NEXT: <COMBINED abbrevid=
				; COMBINED-NEXT: <COMBINED abbrevid=
				; COMBINED-NEXT: <COMBINED abbrevid=
				; COMBINED-NEXT: <COMBINED abbrevid=
				; COMBINED-NEXT: <COMBINED abbrevid=
				; COMBINED-NEXT: <COMBINED abbrevid=
				; COMBINED-NEXT: <COMBINED_PROFILE {{.}} op5=[[HOT1:.]] op6=3 op7=[[HOT2:.]] op8=3 op9=[[HOT3:.]] op10=3 op11=[[COLD:.]] op12=1 op13=[[NONE1:.]] op14=2 op15=[[NONE2:.]] op16=2 op17=[[NONE3:.]] op18=2/>
				; COMBINED_NEXT: <COMBINED abbrevid=
				; COMBINED_NEXT: </GLOBALVAL_SUMMARY_BLOCK>


				; ModuleID = 'thinlto-function-summary-callgraph.ll'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; This function have high profile count, so entry block is hot.
				define void @hot_function(i1 %a, i1 %a2) !prof !20 {
				entry:
				call void @hot1()
				br i1 %a, label %Cold, label %Hot, !prof !41
				Cold: ; 1/1000 goes here
				call void @cold()
				call void @hot2()
				call void @none1()
				br label %exit
				Hot: ; 999/1000 goes here
				call void @hot2()
				call void @hot3()
				br i1 %a2, label %None1, label %None2, !prof !42
				None1: ; half goes here
				call void @none1()
				call void @none2()
				br label %exit
				None2: ; half goes here
				call void @none3()
				br label %exit
				exit:
				ret void
				}

				declare void @hot1() #1
				declare void @hot2() #1
				declare void @hot3() #1
				declare void @cold() #1
				declare void @none1() #1
				declare void @none2() #1
				declare void @none3() #1


				!41 = !{!"branch_weights", i32 1, i32 1000}
				!42 = !{!"branch_weights", i32 1, i32 1}



				!llvm.module.flags = !{!1}
				!20 = !{!"function_entry_count", i64 110}

				!1 = !{i32 1, !"ProfileSummary", !2}
				!2 = !{!3, !4, !5, !6, !7, !8, !9, !10}
				!3 = !{!"ProfileFormat", !"InstrProf"}
				!4 = !{!"TotalCount", i64 10000}
				!5 = !{!"MaxCount", i64 10}
				!6 = !{!"MaxInternalCount", i64 1}
				!7 = !{!"MaxFunctionCount", i64 1000}
				!8 = !{!"NumCounts", i64 3}
				!9 = !{!"NumFunctions", i64 3}
				!10 = !{!"DetailedSummary", !11}
				!11 = !{!12, !13, !14}
				!12 = !{i32 10000, i64 100, i32 1}
				!13 = !{i32 999000, i64 100, i32 1}
				!14 = !{i32 999999, i64 1, i32 2}

llvm/trunk/test/Bitcode/thinlto-function-summary-callgraph.ll

	; Test to check the callgraph in summary			; Test to check the callgraph in summary
	; RUN: opt -module-summary %s -o %t.o			; RUN: opt -module-summary %s -o %t.o
	; RUN: llvm-bcanalyzer -dump %t.o \| FileCheck %s			; RUN: llvm-bcanalyzer -dump %t.o \| FileCheck %s

	; RUN: opt -module-summary %p/Inputs/thinlto-function-summary-callgraph.ll -o %t2.o			; RUN: opt -module-summary %p/Inputs/thinlto-function-summary-callgraph.ll -o %t2.o
	; RUN: llvm-lto -thinlto -o %t3 %t.o %t2.o			; RUN: llvm-lto -thinlto -o %t3 %t.o %t2.o
	; RUN: llvm-bcanalyzer -dump %t3.thinlto.bc \| FileCheck %s --check-prefix=COMBINED			; RUN: llvm-bcanalyzer -dump %t3.thinlto.bc \| FileCheck %s --check-prefix=COMBINED

				; Check parsing for old summary versions generated from this file.
				; RUN: llvm-lto -thinlto-index-stats %p/Inputs/thinlto-function-summary-callgraph.1.bc \| FileCheck %s --check-prefix=OLD
				; RUN: llvm-lto -thinlto-index-stats %p/Inputs/thinlto-function-summary-callgraph-combined.1.bc \| FileCheck %s --check-prefix=OLD-COMBINED

	; CHECK: <GLOBALVAL_SUMMARY_BLOCK			; CHECK: <GLOBALVAL_SUMMARY_BLOCK
	; CHECK-NEXT: <VERSION			; CHECK-NEXT: <VERSION
	; See if the call to func is registered, using the expected callsite count			; See if the call to func is registered, using the expected callsite count
	; and value id matching the subsequent value symbol table.			; and value id matching the subsequent value symbol table.
	; CHECK-NEXT: <PERMODULE {{.*}} op4=[[FUNCID:[0-9]+]] op5=1/>			; CHECK-NEXT: <PERMODULE {{.*}} op4=[[FUNCID:[0-9]+]]/>
	; CHECK-NEXT: </GLOBALVAL_SUMMARY_BLOCK>			; CHECK-NEXT: </GLOBALVAL_SUMMARY_BLOCK>
	; CHECK-NEXT: <VALUE_SYMTAB			; CHECK-NEXT: <VALUE_SYMTAB
	; CHECK-NEXT: <FNENTRY {{.*}} record string = 'main'			; CHECK-NEXT: <FNENTRY {{.*}} record string = 'main'
	; External function func should have entry with value id FUNCID			; External function func should have entry with value id FUNCID
	; CHECK-NEXT: <ENTRY {{.}} op0=[[FUNCID]] {{.}} record string = 'func'			; CHECK-NEXT: <ENTRY {{.}} op0=[[FUNCID]] {{.}} record string = 'func'
	; CHECK-NEXT: </VALUE_SYMTAB>			; CHECK-NEXT: </VALUE_SYMTAB>

	; COMBINED: <GLOBALVAL_SUMMARY_BLOCK			; COMBINED: <GLOBALVAL_SUMMARY_BLOCK
	; COMBINED-NEXT: <VERSION			; COMBINED-NEXT: <VERSION
	; COMBINED-NEXT: <COMBINED			; COMBINED-NEXT: <COMBINED
	; See if the call to func is registered, using the expected callsite count			; See if the call to func is registered, using the expected callsite count
	; and value id matching the subsequent value symbol table.			; and value id matching the subsequent value symbol table.
	; COMBINED-NEXT: <COMBINED {{.*}} op5=[[FUNCID:[0-9]+]] op6=1/>			; COMBINED-NEXT: <COMBINED {{.*}} op5=[[FUNCID:[0-9]+]]/>
	; COMBINED-NEXT: </GLOBALVAL_SUMMARY_BLOCK>			; COMBINED-NEXT: </GLOBALVAL_SUMMARY_BLOCK>
	; COMBINED-NEXT: <VALUE_SYMTAB			; COMBINED-NEXT: <VALUE_SYMTAB
	; Entry for function func should have entry with value id FUNCID			; Entry for function func should have entry with value id FUNCID
	; COMBINED-NEXT: <COMBINED_ENTRY {{.*}} op0=[[FUNCID]] op1=7289175272376759421/>			; COMBINED-NEXT: <COMBINED_ENTRY {{.*}} op0=[[FUNCID]] op1=7289175272376759421/>
	; COMBINED-NEXT: <COMBINED			; COMBINED-NEXT: <COMBINED
	; COMBINED-NEXT: </VALUE_SYMTAB>			; COMBINED-NEXT: </VALUE_SYMTAB>

	; ModuleID = 'thinlto-function-summary-callgraph.ll'			; ModuleID = 'thinlto-function-summary-callgraph.ll'
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define i32 @main() #0 {			define i32 @main() #0 {
	entry:			entry:
	call void (...) @func()			call void (...) @func()
	ret i32 0			ret i32 0
	}			}

	declare void @func(...) #1			declare void @func(...) #1

				; OLD: Index {{.*}} contains 1 nodes (1 functions, 0 alias, 0 globals) and 1 edges (0 refs and 1 calls)
				; OLD-COMBINED: Index {{.*}} contains 2 nodes (2 functions, 0 alias, 0 globals) and 1 edges (0 refs and 1 calls)
				No newline at end of file

llvm/trunk/test/Bitcode/thinlto-function-summary-refgraph.ll

	; Test to check both the callgraph and refgraph in summary			; Test to check both the callgraph and refgraph in summary
	; RUN: opt -module-summary %s -o %t.o			; RUN: opt -module-summary %s -o %t.o
	; RUN: llvm-bcanalyzer -dump %t.o \| FileCheck %s			; RUN: llvm-bcanalyzer -dump %t.o \| FileCheck %s

	; See if the calls and other references are recorded properly using the			; See if the calls and other references are recorded properly using the
	; expected value id and other information as appropriate (callsite cout			; expected value id and other information as appropriate (callsite cout
	; for calls). Use different linkage types for the various test cases to			; for calls). Use different linkage types for the various test cases to
	; distinguish the test cases here (op1 contains the linkage type).			; distinguish the test cases here (op1 contains the linkage type).
	; Note that op3 contains the # non-call references.			; Note that op3 contains the # non-call references.
	; This also ensures that we didn't include a call or reference to intrinsic			; This also ensures that we didn't include a call or reference to intrinsic
	; llvm.ctpop.i8.			; llvm.ctpop.i8.
	; CHECK: <GLOBALVAL_SUMMARY_BLOCK			; CHECK: <GLOBALVAL_SUMMARY_BLOCK
	; Function main contains call to func, as well as address reference to func:			; Function main contains call to func, as well as address reference to func:
	; CHECK-DAG: <PERMODULE {{.}} op0=[[MAINID:[0-9]+]] op1=0 {{.}} op3=1 op4=[[FUNCID:[0-9]+]] op5=[[FUNCID]] op6=1/>			; CHECK-DAG: <PERMODULE {{.}} op0=[[MAINID:[0-9]+]] op1=0 {{.}} op3=1 op4=[[FUNCID:[0-9]+]] op5=[[FUNCID]]/>
	; Function W contains a call to func3 as well as a reference to globalvar:			; Function W contains a call to func3 as well as a reference to globalvar:
	; CHECK-DAG: <PERMODULE {{.}} op0=[[WID:[0-9]+]] op1=5 {{.}} op3=1 op4=[[GLOBALVARID:[0-9]+]] op5=[[FUNC3ID:[0-9]+]] op6=1/>			; CHECK-DAG: <PERMODULE {{.}} op0=[[WID:[0-9]+]] op1=5 {{.}} op3=1 op4=[[GLOBALVARID:[0-9]+]] op5=[[FUNC3ID:[0-9]+]]/>
	; Function X contains call to foo, as well as address reference to foo			; Function X contains call to foo, as well as address reference to foo
	; which is in the same instruction as the call:			; which is in the same instruction as the call:
	; CHECK-DAG: <PERMODULE {{.}} op0=[[XID:[0-9]+]] op1=1 {{.}} op3=1 op4=[[FOOID:[0-9]+]] op5=[[FOOID]] op6=1/>			; CHECK-DAG: <PERMODULE {{.}} op0=[[XID:[0-9]+]] op1=1 {{.}} op3=1 op4=[[FOOID:[0-9]+]] op5=[[FOOID]]/>
	; Function Y contains call to func2, and ensures we don't incorrectly add			; Function Y contains call to func2, and ensures we don't incorrectly add
	; a reference to it when reached while earlier analyzing the phi using its			; a reference to it when reached while earlier analyzing the phi using its
	; return value:			; return value:
	; CHECK-DAG: <PERMODULE {{.}} op0=[[YID:[0-9]+]] op1=8 {{.}} op3=0 op4=[[FUNC2ID:[0-9]+]] op5=1/>			; CHECK-DAG: <PERMODULE {{.}} op0=[[YID:[0-9]+]] op1=8 {{.}} op3=0 op4=[[FUNC2ID:[0-9]+]]/>
	; Function Z contains call to func2, and ensures we don't incorrectly add			; Function Z contains call to func2, and ensures we don't incorrectly add
	; a reference to it when reached while analyzing subsequent use of its return			; a reference to it when reached while analyzing subsequent use of its return
	; value:			; value:
	; CHECK-DAG: <PERMODULE {{.}} op0=[[ZID:[0-9]+]] op1=3 {{.}} op3=0 op4=[[FUNC2ID:[0-9]+]] op5=1/>			; CHECK-DAG: <PERMODULE {{.}} op0=[[ZID:[0-9]+]] op1=3 {{.}} op3=0 op4=[[FUNC2ID:[0-9]+]]/>
	; Variable bar initialization contains address reference to func:			; Variable bar initialization contains address reference to func:
	; CHECK-DAG: <PERMODULE_GLOBALVAR_INIT_REFS {{.*}} op0=[[BARID:[0-9]+]] op1=0 op2=[[FUNCID]]/>			; CHECK-DAG: <PERMODULE_GLOBALVAR_INIT_REFS {{.*}} op0=[[BARID:[0-9]+]] op1=0 op2=[[FUNCID]]/>
	; CHECK: </GLOBALVAL_SUMMARY_BLOCK>			; CHECK: </GLOBALVAL_SUMMARY_BLOCK>

	; CHECK-NEXT: <VALUE_SYMTAB			; CHECK-NEXT: <VALUE_SYMTAB
	; CHECK-DAG: <ENTRY {{.}} op0=[[BARID]] {{.}} record string = 'bar'			; CHECK-DAG: <ENTRY {{.}} op0=[[BARID]] {{.}} record string = 'bar'
	; CHECK-DAG: <ENTRY {{.}} op0=[[FUNCID]] {{.}} record string = 'func'			; CHECK-DAG: <ENTRY {{.}} op0=[[FUNCID]] {{.}} record string = 'func'
	; CHECK-DAG: <ENTRY {{.}} op0=[[FOOID]] {{.}} record string = 'foo'			; CHECK-DAG: <ENTRY {{.}} op0=[[FOOID]] {{.}} record string = 'foo'
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/FunctionImport/Inputs/hotness_based_import.ll

				; ModuleID = 'thinlto-function-summary-callgraph-profile-summary2.ll'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"


				define void @hot1() #1 {
				ret void
				}
				define void @hot2() #1 {
				call void @externalFunction()
				call void @externalFunction()
				ret void
				}
				define void @hot3() #1 {
				call void @externalFunction()
				call void @externalFunction()
				call void @externalFunction()
				ret void
				}
				define void @cold() #1 {
				ret void
				}
				define void @cold2() #1 {
				call void @externalFunction()
				call void @externalFunction()
				ret void
				}

				define void @none1() #1 {
				ret void
				}
				define void @none2() #1 {
				call void @externalFunction()
				ret void
				}
				define void @none3() #1 {
				call void @externalFunction()
				call void @externalFunction()
				ret void
				}


				declare void @externalFunction()

llvm/trunk/test/Transforms/FunctionImport/hotness_based_import.ll

				; Test to check the callgraph in summary when there is PGO
				; RUN: opt -module-summary %s -o %t.bc
				; RUN: opt -module-summary %p/Inputs/hotness_based_import.ll -o %t2.bc
				; RUN: llvm-lto -thinlto -o %t3 %t.bc %t2.bc

				; Test import with default hot multiplier (3)
				; RUN: opt -function-import -summary-file %t3.thinlto.bc %t.bc -import-instr-limit=1 --S \| FileCheck %s --check-prefix=CHECK --check-prefix=HOT-DEFAULT
				; RUN: opt -function-import -summary-file %t3.thinlto.bc %t.bc -import-instr-limit=1 --S -import-hot-multiplier=3.0 \| FileCheck %s --check-prefix=CHECK --check-prefix=HOT-DEFAULT
				; HOT-DEFAULT-DAG: define available_externally void @hot1()
				; HOT-DEFAULT-DAG: define available_externally void @hot2()
				; HOT-DEFAULT-DAG: define available_externally void @cold()
				; HOT-DEFAULT-DAG: define available_externally void @none1()

				; HOT-DEFAULT-NOT: define available_externally void @hot3()
				; HOT-DEFAULT-NOT: define available_externally void @none2()
				; HOT-DEFAULT-NOT: define available_externally void @none3()
				; HOT-DEFAULT-NOT: define available_externally void @cold2()


				; Test import with hot multiplier 1.0 - treat hot callsites as normal.
				; RUN: opt -function-import -summary-file %t3.thinlto.bc %t.bc -import-instr-limit=1 -import-hot-multiplier=1.0 --S \| FileCheck %s --check-prefix=CHECK --check-prefix=HOT-ONE
				; HOT-ONE-DAG: define available_externally void @hot1()
				; HOT-ONE-DAG: define available_externally void @cold()
				; HOT-ONE-DAG: define available_externally void @none1()
				; HOT-ONE-NOT: define available_externally void @hot2()
				; HOT-ONE-NOT: define available_externally void @hot3()
				; HOT-ONE-NOT: define available_externally void @none2()
				; HOT-ONE-NOT: define available_externally void @none3()
				; HOT-ONE-NOT: define available_externally void @cold2()


				; Test import with hot multiplier 0.0 and high threshold - don't import functions called from hot callsite.
				; RUN: opt -function-import -summary-file %t3.thinlto.bc %t.bc -import-instr-limit=10 -import-hot-multiplier=0.0 --S \| FileCheck %s --check-prefix=CHECK --check-prefix=HOT-ZERO
				; HOT-ZERO-DAG: define available_externally void @cold()
				; HOT-ZERO-DAG: define available_externally void @none1()
				; HOT-ZERO-DAG: define available_externally void @none2()
				; HOT-ZERO-DAG: define available_externally void @none3()
				; HOT-ZERO-DAG: define available_externally void @cold2()
				; HOT-ZERO-NOT: define available_externally void @hot2()
				; HOT-ZERO-NOT: define available_externally void @hot1()
				; HOT-ZERO-NOT: define available_externally void @hot3()



				; ModuleID = 'thinlto-function-summary-callgraph.ll'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; This function have high profile count, so entry block is hot.
				define void @hot_function(i1 %a, i1 %a2) !prof !20 {
				entry:
				call void @hot1()
				br i1 %a, label %Cold, label %Hot, !prof !41
				Cold: ; 1/1000 goes here
				call void @cold()
				call void @cold2()
				call void @hot2()
				call void @none1()
				br label %exit
				Hot: ; 999/1000 goes here
				call void @hot2()
				call void @hot3()
				br i1 %a2, label %None1, label %None2, !prof !42
				None1: ; half goes here
				call void @none1()
				call void @none2()
				br label %exit
				None2: ; half goes here
				call void @none3()
				br label %exit
				exit:
				ret void
				}

				declare void @hot1() #1
				declare void @hot2() #1
				declare void @hot3() #1
				declare void @cold() #1
				declare void @cold2() #1
				declare void @none1() #1
				declare void @none2() #1
				declare void @none3() #1


				!41 = !{!"branch_weights", i32 1, i32 1000}
				!42 = !{!"branch_weights", i32 1, i32 1}



				!llvm.module.flags = !{!1}
				!20 = !{!"function_entry_count", i64 110}

				!1 = !{i32 1, !"ProfileSummary", !2}
				!2 = !{!3, !4, !5, !6, !7, !8, !9, !10}
				!3 = !{!"ProfileFormat", !"InstrProf"}
				!4 = !{!"TotalCount", i64 10000}
				!5 = !{!"MaxCount", i64 10}
				!6 = !{!"MaxInternalCount", i64 1}
				!7 = !{!"MaxFunctionCount", i64 1000}
				!8 = !{!"NumCounts", i64 3}
				!9 = !{!"NumFunctions", i64 3}
				!10 = !{!"DetailedSummary", !11}
				!11 = !{!12, !13, !14}
				!12 = !{i32 10000, i64 100, i32 1}
				!13 = !{i32 999000, i64 100, i32 1}
				!14 = !{i32 999999, i64 1, i32 2}

This is an archive of the discontinued LLVM Phabricator instance.

[thinlto] Basic thinlto fdo heuristicClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 72558

llvm/trunk/include/llvm/Analysis/ModuleSummaryAnalysis.h

llvm/trunk/include/llvm/Bitcode/LLVMBitCodes.h

llvm/trunk/include/llvm/IR/ModuleSummaryIndex.h

llvm/trunk/lib/Analysis/ModuleSummaryAnalysis.cpp

llvm/trunk/lib/Bitcode/Reader/BitcodeReader.cpp

llvm/trunk/lib/Bitcode/Writer/BitcodeWriter.cpp

llvm/trunk/lib/LTO/ThinLTOCodeGenerator.cpp

llvm/trunk/lib/Transforms/IPO/FunctionImport.cpp

llvm/trunk/test/Bitcode/Inputs/thinlto-function-summary-callgraph-combined.1.bc

llvm/trunk/test/Bitcode/Inputs/thinlto-function-summary-callgraph-pgo-combined.1.bc

llvm/trunk/test/Bitcode/Inputs/thinlto-function-summary-callgraph-pgo.1.bc

llvm/trunk/test/Bitcode/Inputs/thinlto-function-summary-callgraph-profile-summary.ll

llvm/trunk/test/Bitcode/Inputs/thinlto-function-summary-callgraph.1.bc

llvm/trunk/test/Bitcode/summary_version.ll

llvm/trunk/test/Bitcode/thinlto-alias.ll

llvm/trunk/test/Bitcode/thinlto-function-summary-callgraph-pgo.ll

llvm/trunk/test/Bitcode/thinlto-function-summary-callgraph-profile-summary.ll

llvm/trunk/test/Bitcode/thinlto-function-summary-callgraph.ll

llvm/trunk/test/Bitcode/thinlto-function-summary-refgraph.ll

llvm/trunk/test/Transforms/FunctionImport/Inputs/hotness_based_import.ll

llvm/trunk/test/Transforms/FunctionImport/hotness_based_import.ll

[thinlto] Basic thinlto fdo heuristic
ClosedPublic