This is an archive of the discontinued LLVM Phabricator instance.

[SampleFDO] Flow Sensitive Sample FDO (FSAFDO)
Needs ReviewPublic

Authored by xur on Mar 22 2021, 3:36 PM.

Download Raw Diff

Details

Reviewers

davidxl
wmi
hoy
snehasish

Summary

This patch implements Flow Sensitive Sample FDO (FSAFDO). It has the following
changes:
(1) disable current discriminator coding scheme.
(2) new hierarchical discriminator for FSAFDO
(3) FSAFDO profile loader.

For this patch, "-enable-fs-discriminator=true" turns on the new functionality.
"-enable-fs-discriminator=false" (the default) keeps current Sample FDO behavior.

This patch is not intended for check-in. I post it mainly to get the advises on to break
into smaller patches. Also because of this, I did not include the test cases.

Diff Detail

Event Timeline

xur created this revision.Mar 22 2021, 3:36 PM

Herald added subscribers: dexonsmith, wenlei, pengfei and 4 others. · View Herald TranscriptMar 22 2021, 3:36 PM

xur requested review of this revision.Mar 22 2021, 3:36 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2021, 3:36 PM

Harbormaster completed remote builds in B95106: Diff 332450.Mar 22 2021, 5:06 PM

davidxl added inline comments.Mar 22 2021, 8:57 PM

llvm/lib/CodeGen/TargetPassConfig.cpp
1178	Is it necessary for this pass? BranchFolding does not create new clones, but merge them, so discriminator subsections can be reused (even though after the branch folding, some of the discriminator in that section gets removed)?
1478	I can see the importance of adding pass to add discriminator after MBP due to tail dup in MBP, but how important (performance wise) it is to load sample profile again for branch folding pass?

xur added inline comments.Mar 23 2021, 10:12 AM

llvm/lib/CodeGen/TargetPassConfig.cpp
1178	I had this is mainly for the tail duplication in block placement pass. Conceptually it for all the new clones in the pipeline. I have the statistics of the counter for different rounds of discriminators. We do have plenty of this. That said, more of the performance are from the round before block placement. If I disable this, the performance change is within the noise range (for the benchmark I used).
1478	This is from my experiments. The default is off anyway. I will probably remove this.

Regarding the staging of the patch, here is the suggestion:

Replace DF with FSDescriminator.

In this patch, FSdescriminator needs to be added/inserted in the last phase (with ranges of discriminator sections reserved but not used), and there will be no new sample loading passes added. The functionality of current DF should be mostly preserved.

Add other AddDiscriminator passes

Add new sampleProfile loader passes one by one.

For 1), the profile reading offline tool should be able to handle two versions of formats to allow co-existence of profiles.

llvm/lib/CodeGen/TargetPassConfig.cpp
1178	Can probably make use this discriminator range for other purposes (target optimizations)
1478	If it is not important for performance, suggest making it (the sample profile loading part) off by default to avoid unnecessary compile time increase.

snehasish added inline comments.Mar 29 2021, 9:45 AM

llvm/include/llvm/Support/FSAFDODiscriminator.h
19	A few questions about the discriminator bits: Depending on the transformation in the target pass the requirement of bits may be different, i.e. 5 bits for each may be too many or too few. Do you have any data to share about how many bits are used by each? How do we alert authors of new target optimizations (or code refactoring) additional discriminator bits are needed to disambiguate? Would a late stage analysis only pass which enumerates different instructions with the same debug+discriminator info be useful to commit? If I understand correctly, we bump the bit for each level of cloning. This seems to be a less efficient coding scheme, max 5 bits where by enumeration you could identify 31 clones? Have you considered other coding schemes?

davidxl added inline comments.Mar 29 2021, 10:38 AM

llvm/include/llvm/Support/FSAFDODiscriminator.h
19	A few questions about the discriminator bits: Depending on the transformation in the target pass the requirement of bits may be different, i.e. 5 bits for each may be too many or too few. Do you have any data to share about how many bits are used by each? I assume most of the transformations produce few clones except for unrolling (which depends on unroll factor). How do we alert authors of new target optimizations (or code refactoring) additional discriminator bits are needed to disambiguate? Would a late stage analysis only pass which enumerates different instructions with the same debug+discriminator info be useful to commit? The problem with this is that the authors won't have any means to change anything. Rong and I discussed about this. Longer term when this becomes and issue, increasing the size of the discriminator container type will be the way to go. If I understand correctly, we bump the bit for each level of cloning. This seems to be a less efficient coding scheme, max 5 bits where by enumeration you could identify 31 clones? Have you considered other coding schemes? The biggest advantage of fixed width is simplicity, I think.

snehasish added inline comments.Mar 29 2021, 11:10 AM

llvm/include/llvm/Support/FSAFDODiscriminator.h
19	The biggest advantage of fixed width is simplicity, I think. Not sure if I made the point clearly, so repeating - I'm trying to distinguish between using 5 values (one per bit) in a pass vs 2^5-1 values which can be enumerated by 5 bits. For example, with a loop unroll factor of >5, this coding scheme will not be able to assign unique ids for all clones. Is this understanding correct? I think redistributing the bits based on the pass transformations should be sufficient to avoid a more complex coding scheme. I agree using fixed width bits makes for a simple implementation and we should prefer this unless data shows otherwise.

davidxl added inline comments.Mar 29 2021, 11:39 AM

llvm/include/llvm/Support/FSAFDODiscriminator.h
19	Rong can answer this question more thoroughly. By just looking at line 373 of FlowSensitiveSampleProfile.cpp file, it seems it is actually sequentially increasing FSD, instead of bumping the bit?

snehasish added inline comments.Mar 30 2021, 3:15 PM

llvm/include/llvm/Support/FSAFDODiscriminator.h
19	Thanks for the pointer and Rong clarified the usage. My understanding was incorrect and the encoding space is not as limited as I thought. Since there is no measurable performance difference the current approach seems simplest.

snehasish resigned from this revision.Jan 26 2023, 1:22 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 26 2023, 1:22 PM

Herald added subscribers: • pcwang-thead, ormris. · View Herald Transcript

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

FlowSensitiveSampleProfile.h

103 lines

MachineDominators.h

6 lines

MachineOptimizationRemarkEmitter.h

6 lines

Passes.h

13 lines

IR/

DebugInfoMetadata.h

24 lines

InitializePasses.h

18 lines

LTO/

Config.h

3 lines

ProfileData/

SampleProfReader.h

12 lines

Support/

FSAFDODiscriminator.h

100 lines

Transforms/

Utils/

SampleProfileLoaderBaseImpl.h

45 lines

lib/

CodeGen/

CMakeLists.txt

1 line

FlowSensitiveSampleProfile.cpp

454 lines

TargetPassConfig.cpp

38 lines

LTO/

LTOBackend.cpp

9 lines

Passes/

PassBuilder.cpp

11 lines

ProfileData/

SampleProf.cpp

26 lines

SampleProfReader.cpp

56 lines

Target/

X86/

X86InsertPrefetch.cpp

1 line

Transforms/

IPO/

SampleProfile.cpp

1 line

Utils/

LoopUnroll.cpp

2 lines

LoopUnrollAndJam.cpp

2 lines

Vectorize/

LoopVectorize.cpp

5 lines

tools/

llvm-profdata/

llvm-profdata.cpp

38 lines

unittests/

ProfileData/

SampleProfTest.cpp

3 lines

Diff 332450

llvm/include/llvm/CodeGen/FlowSensitiveSampleProfile.h

This file was added.

				//===----- FlowSensitiveSampleProfile.h: FS SampleFDO Support ---- c++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains the supoorting functions for Flow Sensitive Sample FDO.
				// The AddFSDiscriminators pass adds flow sensitive DRAWF discriminators to the
				// instuctions, so that different instruction clones will have their own
				// sample value.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CODEGEN_FLOWSENSITIVESAMPLEPROFILE_H
				#define LLVM_CODEGEN_FLOWSENSITIVESAMPLEPROFILE_H

				#include "llvm/Analysis/ProfileSummaryInfo.h"
				#include "llvm/CodeGen/MachineBasicBlock.h"
				#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
				#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
				#include "llvm/CodeGen/MachineDominators.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstr.h"
				#include "llvm/CodeGen/MachineLoopInfo.h"
				#include "llvm/CodeGen/MachineOptimizationRemarkEmitter.h"
				#include "llvm/CodeGen/MachinePostDominators.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/IR/DebugInfoMetadata.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/Module.h"
				#include "llvm/InitializePasses.h"
				#include "llvm/ProfileData/InstrProf.h"
				#include "llvm/ProfileData/SampleProf.h"
				#include "llvm/ProfileData/SampleProfReader.h"

				#include <cassert>

				namespace llvm {

				class AddFSDiscriminators : public MachineFunctionPass {
				MachineFunction *MF;
				unsigned LowBit;
				unsigned HighBit;

				public:
				static char ID;
				/// FS bits will only use the '1' bits in the Mask.
				AddFSDiscriminators(unsigned LowBit = 0, unsigned HighBit = 0)
				: MachineFunctionPass(ID), LowBit(LowBit), HighBit(HighBit) {
				assert(LowBit < HighBit && "HighBit needs to be greater than Lowbit");
				}

				/// getNumFSBBs() - Return the number of BBs that have FS samaples.
				unsigned getNumFSBBs();

				/// getNumFSSamples() - Return the number of samples that are flow sensitive.
				uint64_t getNumFSSamples();

				/// getMachineFunction - Return the last machine function computed.
				const MachineFunction *getMachineFunction() const { return MF; }

				private:
				bool runOnMachineFunction(MachineFunction &) override;
				};

				class FSProfileLoader;
				class FSProfileLoaderPass : public MachineFunctionPass {
				MachineFunction *MF;
				std::string ProfileFileName;
				unsigned LowBit;
				unsigned HighBit;

				public:
				static char ID;
				/// FS bits will only use the '1' bits in the Mask.
				FSProfileLoaderPass(std::string Filename = "", unsigned LowBit = 0,
				unsigned HighBit = 0)
				: MachineFunctionPass(ID), ProfileFileName(Filename), LowBit(LowBit),
				HighBit(HighBit),
				FSSampleLoader(std::make_unique<FSProfileLoader>(Filename)) {
				assert(LowBit < HighBit && "HighBit needs to be greater than Lowbit");
				}

				/// getMachineFunction - Return the last machine function computed.
				const MachineFunction *getMachineFunction() const { return MF; }

				private:
				void init(MachineFunction &MF);
				bool runOnMachineFunction(MachineFunction &) override;
				bool doInitialization(Module &M) override;
				void getAnalysisUsage(AnalysisUsage &AU) const override;

				std::unique_ptr<FSProfileLoader> FSSampleLoader;
				/// Hold the information of the basic block frequency.
				MachineBlockFrequencyInfo *MBFI;
				};

				} // end namespace llvm

				#endif // LLVM_CODEGEN_FLOWSENSITIVESAMPLEPROFILE_H

llvm/include/llvm/CodeGen/MachineDominators.h

Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	public:
void calculate(MachineFunction &F);		void calculate(MachineFunction &F);

bool dominates(const MachineDomTreeNode *A,		bool dominates(const MachineDomTreeNode *A,
const MachineDomTreeNode *B) const {		const MachineDomTreeNode *B) const {
applySplitCriticalEdges();		applySplitCriticalEdges();
return DT->dominates(A, B);		return DT->dominates(A, B);
}		}

		void getDescendants(MachineBasicBlock *A,
		SmallVectorImpl<MachineBasicBlock *> &Result) {
		applySplitCriticalEdges();
		DT->getDescendants(A, Result);
		}

bool dominates(const MachineBasicBlock A, const MachineBasicBlock B) const {		bool dominates(const MachineBasicBlock A, const MachineBasicBlock B) const {
applySplitCriticalEdges();		applySplitCriticalEdges();
return DT->dominates(A, B);		return DT->dominates(A, B);
}		}

// dominates - Return true if A dominates B. This performs the		// dominates - Return true if A dominates B. This performs the
// special checks necessary if A and B are in the same basic block.		// special checks necessary if A and B are in the same basic block.
bool dominates(const MachineInstr A, const MachineInstr B) const {		bool dominates(const MachineInstr A, const MachineInstr B) const {
▲ Show 20 Lines • Show All 159 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/MachineOptimizationRemarkEmitter.h

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	public:
/// remark. \p Loc is the debug location and \p MBB is the block that the		/// remark. \p Loc is the debug location and \p MBB is the block that the
/// optimization operates in.		/// optimization operates in.
MachineOptimizationRemarkAnalysis(const char *PassName, StringRef RemarkName,		MachineOptimizationRemarkAnalysis(const char *PassName, StringRef RemarkName,
const DiagnosticLocation &Loc,		const DiagnosticLocation &Loc,
const MachineBasicBlock *MBB)		const MachineBasicBlock *MBB)
: DiagnosticInfoMIROptimization(DK_MachineOptimizationRemarkAnalysis,		: DiagnosticInfoMIROptimization(DK_MachineOptimizationRemarkAnalysis,
PassName, RemarkName, Loc, MBB) {}		PassName, RemarkName, Loc, MBB) {}

		MachineOptimizationRemarkAnalysis(const char *PassName, StringRef RemarkName,
		const MachineInstr *MI)
		: DiagnosticInfoMIROptimization(DK_MachineOptimizationRemarkAnalysis,
		PassName, RemarkName, MI->getDebugLoc(),
		MI->getParent()) {}

static bool classof(const DiagnosticInfo *DI) {		static bool classof(const DiagnosticInfo *DI) {
return DI->getKind() == DK_MachineOptimizationRemarkAnalysis;		return DI->getKind() == DK_MachineOptimizationRemarkAnalysis;
}		}

/// \see DiagnosticInfoOptimizationBase::isEnabled.		/// \see DiagnosticInfoOptimizationBase::isEnabled.
bool isEnabled() const override {		bool isEnabled() const override {
const Function &Fn = getFunction();		const Function &Fn = getFunction();
LLVMContext &Ctx = Fn.getContext();		LLVMContext &Ctx = Fn.getContext();
▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/Passes.h

Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	/// MachineDominanaceFrontier - This pass is a machine dominators analysis pass.
extern char &DeadMachineInstructionElimID;		extern char &DeadMachineInstructionElimID;

/// This pass adds dead/undef flags after analyzing subregister lanes.		/// This pass adds dead/undef flags after analyzing subregister lanes.
extern char &DetectDeadLanesID;		extern char &DetectDeadLanesID;

/// This pass perform post-ra machine sink for COPY instructions.		/// This pass perform post-ra machine sink for COPY instructions.
extern char &PostRAMachineSinkingID;		extern char &PostRAMachineSinkingID;

		/// This pass adds flow sensitive discriminators.
		extern char &AddFSDiscriminatorsID;

		/// This pass reads flow sensitive profile.
		extern char &FSProfileLoaderPassID;

/// FastRegisterAllocation Pass - This pass register allocates as fast as		/// FastRegisterAllocation Pass - This pass register allocates as fast as
/// possible. It is best suited for debug code where live ranges are short.		/// possible. It is best suited for debug code where live ranges are short.
///		///
FunctionPass *createFastRegisterAllocator();		FunctionPass *createFastRegisterAllocator();

/// BasicRegisterAllocation Pass - This pass implements a degenerate global		/// BasicRegisterAllocation Pass - This pass implements a degenerate global
/// register allocator using the basic regalloc framework.		/// register allocator using the basic regalloc framework.
///		///
▲ Show 20 Lines • Show All 301 Lines • ▼ Show 20 Lines	/// MachineDominanaceFrontier - This pass is a machine dominators analysis pass.
FunctionPass *createHardwareLoopsPass();		FunctionPass *createHardwareLoopsPass();

/// This pass inserts pseudo probe annotation for callsite profiling.		/// This pass inserts pseudo probe annotation for callsite profiling.
FunctionPass *createPseudoProbeInserter();		FunctionPass *createPseudoProbeInserter();

/// Create IR Type Promotion pass. \see TypePromotion.cpp		/// Create IR Type Promotion pass. \see TypePromotion.cpp
FunctionPass *createTypePromotionPass();		FunctionPass *createTypePromotionPass();

		/// Add Flow Sensitive Discriminators.
		FunctionPass *createAddFSDiscriminatorsPass(unsigned LowBit,
		unsigned HighBit);

		/// Read Flow Sensitive Profile.
		FunctionPass *createFSProfileLoaderPass(std::string File, unsigned LowBit,
		unsigned HighBit);
/// Creates MIR Debugify pass. \see MachineDebugify.cpp		/// Creates MIR Debugify pass. \see MachineDebugify.cpp
ModulePass *createDebugifyMachineModulePass();		ModulePass *createDebugifyMachineModulePass();

/// Creates MIR Strip Debug pass. \see MachineStripDebug.cpp		/// Creates MIR Strip Debug pass. \see MachineStripDebug.cpp
/// If OnlyDebugified is true then it will only strip debug info if it was		/// If OnlyDebugified is true then it will only strip debug info if it was
/// added by a Debugify pass. The module will be left unchanged if the debug		/// added by a Debugify pass. The module will be left unchanged if the debug
/// info was generated by another source such as clang.		/// info was generated by another source such as clang.
ModulePass *createStripDebugMachineModulePass(bool OnlyDebugified);		ModulePass *createStripDebugMachineModulePass(bool OnlyDebugified);
Show All 18 Lines

llvm/include/llvm/IR/DebugInfoMetadata.h

Show All 20 Lines
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/iterator_range.h"		#include "llvm/ADT/iterator_range.h"
#include "llvm/BinaryFormat/Dwarf.h"		#include "llvm/BinaryFormat/Dwarf.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
		#include "llvm/Support/CommandLine.h"
		#include "llvm/Support/FSAFDODiscriminator.h"
#include <cassert>		#include <cassert>
#include <climits>		#include <climits>
#include <cstddef>		#include <cstddef>
#include <cstdint>		#include <cstdint>
#include <iterator>		#include <iterator>
#include <type_traits>		#include <type_traits>
#include <vector>		#include <vector>

Show All 16 Lines	#define DEFINE_MDNODE_GET(CLASS, FORMAL, ARGS) \
} \		} \
static CLASS *getIfExists(LLVMContext &Context, \		static CLASS *getIfExists(LLVMContext &Context, \
DEFINE_MDNODE_GET_UNPACK(FORMAL)) { \		DEFINE_MDNODE_GET_UNPACK(FORMAL)) { \
return getImpl(Context, DEFINE_MDNODE_GET_UNPACK(ARGS), Uniqued, \		return getImpl(Context, DEFINE_MDNODE_GET_UNPACK(ARGS), Uniqued, \
/* ShouldCreate */ false); \		/* ShouldCreate */ false); \
} \		} \
DEFINE_MDNODE_GET_DISTINCT_TEMPORARY(CLASS, FORMAL, ARGS)		DEFINE_MDNODE_GET_DISTINCT_TEMPORARY(CLASS, FORMAL, ARGS)

		extern llvm::cl::opt<bool> EnableFSDiscriminator;

namespace llvm {		namespace llvm {

class DITypeRefArray {		class DITypeRefArray {
const MDTuple *N = nullptr;		const MDTuple *N = nullptr;

public:		public:
DITypeRefArray() = default;		DITypeRefArray() = default;
DITypeRefArray(const MDTuple *N) : N(N) {}		DITypeRefArray(const MDTuple *N) : N(N) {}
▲ Show 20 Lines • Show All 1,684 Lines • ▼ Show 20 Lines	public:

/// Try to combine the vector of locations passed as input in a single one.		/// Try to combine the vector of locations passed as input in a single one.
/// This function applies getMergedLocation() repeatedly left-to-right.		/// This function applies getMergedLocation() repeatedly left-to-right.
///		///
/// \p Locs: The locations to be merged.		/// \p Locs: The locations to be merged.
static		static
const DILocation getMergedLocations(ArrayRef<const DILocation > Locs);		const DILocation getMergedLocations(ArrayRef<const DILocation > Locs);

		/// Return Discriminator that cleaning bit B and above (0 based, inclusive).
		/// (0x1FF, 7) = 0xFF.
		static unsigned getMaskedDiscriminator(unsigned D, unsigned B) {
		if (B == 0)
		return D;
		return (D & getN1Bits(B));
		}
		static unsigned getBaseDiscriminatorBits() { return BASE_DIS_BIT_END; }

/// Returns the base discriminator for a given encoded discriminator \p D.		/// Returns the base discriminator for a given encoded discriminator \p D.
static unsigned getBaseDiscriminatorFromDiscriminator(unsigned D) {		static unsigned getBaseDiscriminatorFromDiscriminator(unsigned D) {
		if (EnableFSDiscriminator)
		return getMaskedDiscriminator(D, getBaseDiscriminatorBits());
return getUnsignedFromPrefixEncoding(D);		return getUnsignedFromPrefixEncoding(D);
}		}

/// Raw encoding of the discriminator. APIs such as cloneWithDuplicationFactor		/// Raw encoding of the discriminator. APIs such as cloneWithDuplicationFactor
/// have certain special case behavior (e.g. treating empty duplication factor		/// have certain special case behavior (e.g. treating empty duplication factor
/// as the value '1').		/// as the value '1').
/// This API, in conjunction with cloneWithDiscriminator, may be used to encode		/// This API, in conjunction with cloneWithDiscriminator, may be used to encode
/// the raw values provided. \p BD: base discriminator \p DF: duplication factor		/// the raw values provided. \p BD: base discriminator \p DF: duplication factor
/// \p CI: copy index		/// \p CI: copy index
/// The return is None if the values cannot be encoded in 32 bits - for		/// The return is None if the values cannot be encoded in 32 bits - for
/// example, values for BD or DF larger than 12 bits. Otherwise, the return		/// example, values for BD or DF larger than 12 bits. Otherwise, the return
/// is the encoded value.		/// is the encoded value.
static Optional<unsigned> encodeDiscriminator(unsigned BD, unsigned DF, unsigned CI);		static Optional<unsigned> encodeDiscriminator(unsigned BD, unsigned DF, unsigned CI);

/// Raw decoder for values in an encoded discriminator D.		/// Raw decoder for values in an encoded discriminator D.
static void decodeDiscriminator(unsigned D, unsigned &BD, unsigned &DF,		static void decodeDiscriminator(unsigned D, unsigned &BD, unsigned &DF,
unsigned &CI);		unsigned &CI);

/// Returns the duplication factor for a given encoded discriminator \p D, or		/// Returns the duplication factor for a given encoded discriminator \p D, or
/// 1 if no value or 0 is encoded.		/// 1 if no value or 0 is encoded.
static unsigned getDuplicationFactorFromDiscriminator(unsigned D) {		static unsigned getDuplicationFactorFromDiscriminator(unsigned D) {
		if (EnableFSDiscriminator)
		return 1;
D = getNextComponentInDiscriminator(D);		D = getNextComponentInDiscriminator(D);
unsigned Ret = getUnsignedFromPrefixEncoding(D);		unsigned Ret = getUnsignedFromPrefixEncoding(D);
if (Ret == 0)		if (Ret == 0)
return 1;		return 1;
return Ret;		return Ret;
}		}

/// Returns the copy identifier for a given encoded discriminator \p D.		/// Returns the copy identifier for a given encoded discriminator \p D.
▲ Show 20 Lines • Show All 423 Lines • ▼ Show 20 Lines
}		}

unsigned DILocation::getCopyIdentifier() const {		unsigned DILocation::getCopyIdentifier() const {
return getCopyIdentifierFromDiscriminator(getDiscriminator());		return getCopyIdentifierFromDiscriminator(getDiscriminator());
}		}

Optional<const DILocation *> DILocation::cloneWithBaseDiscriminator(unsigned D) const {		Optional<const DILocation *> DILocation::cloneWithBaseDiscriminator(unsigned D) const {
unsigned BD, DF, CI;		unsigned BD, DF, CI;
		if (EnableFSDiscriminator)
		BD = getBaseDiscriminator();
		else
decodeDiscriminator(getDiscriminator(), BD, DF, CI);		decodeDiscriminator(getDiscriminator(), BD, DF, CI);
if (D == BD)		if (D == BD)
return this;		return this;
		if (EnableFSDiscriminator)
		return cloneWithDiscriminator(D);
if (Optional<unsigned> Encoded = encodeDiscriminator(D, DF, CI))		if (Optional<unsigned> Encoded = encodeDiscriminator(D, DF, CI))
return cloneWithDiscriminator(*Encoded);		return cloneWithDiscriminator(*Encoded);
return None;		return None;
}		}

Optional<const DILocation *> DILocation::cloneByMultiplyingDuplicationFactor(unsigned DF) const {		Optional<const DILocation *> DILocation::cloneByMultiplyingDuplicationFactor(unsigned DF) const {
DF *= getDuplicationFactor();		DF *= getDuplicationFactor();
if (DF <= 1)		if (DF <= 1)
▲ Show 20 Lines • Show All 1,424 Lines • Show Last 20 Lines

llvm/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines

	/// Initialize all passes linked into the GlobalISel library.			/// Initialize all passes linked into the GlobalISel library.
	void initializeGlobalISel(PassRegistry&);			void initializeGlobalISel(PassRegistry&);

	/// Initialize all passes linked into the CodeGen library.			/// Initialize all passes linked into the CodeGen library.
	void initializeTarget(PassRegistry&);			void initializeTarget(PassRegistry&);

	void initializeAAEvalLegacyPassPass(PassRegistry&);			void initializeAAEvalLegacyPassPass(PassRegistry&);
	void initializeAAResultsWrapperPassPass(PassRegistry&);			void initializeAAResultsWrapperPassPass(PassRegistry &);
	void initializeADCELegacyPassPass(PassRegistry&);			void initializeADCELegacyPassPass(PassRegistry &);
	void initializeAddDiscriminatorsLegacyPassPass(PassRegistry&);			void initializeAddDiscriminatorsLegacyPassPass(PassRegistry &);
				void initializeAddFSDiscriminatorsPass(PassRegistry &);
	void initializeModuleAddressSanitizerLegacyPassPass(PassRegistry &);			void initializeModuleAddressSanitizerLegacyPassPass(PassRegistry &);
	void initializeASanGlobalsMetadataWrapperPassPass(PassRegistry &);			void initializeASanGlobalsMetadataWrapperPassPass(PassRegistry &);
	void initializeAddressSanitizerLegacyPassPass(PassRegistry &);			void initializeAddressSanitizerLegacyPassPass(PassRegistry &);
	void initializeAggressiveInstCombinerLegacyPassPass(PassRegistry&);			void initializeAggressiveInstCombinerLegacyPassPass(PassRegistry&);
	void initializeAliasSetPrinterPass(PassRegistry&);			void initializeAliasSetPrinterPass(PassRegistry&);
	void initializeAlignmentFromAssumptionsPass(PassRegistry&);			void initializeAlignmentFromAssumptionsPass(PassRegistry&);
	void initializeAlwaysInlinerLegacyPassPass(PassRegistry&);			void initializeAlwaysInlinerLegacyPassPass(PassRegistry&);
	void initializeAssumeSimplifyPassLegacyPassPass(PassRegistry &);			void initializeAssumeSimplifyPassLegacyPassPass(PassRegistry &);
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	void initializeEarlyTailDuplicatePass(PassRegistry&);			void initializeEarlyTailDuplicatePass(PassRegistry&);
	void initializeEdgeBundlesPass(PassRegistry&);			void initializeEdgeBundlesPass(PassRegistry&);
	void initializeEHContGuardCatchretPass(PassRegistry &);			void initializeEHContGuardCatchretPass(PassRegistry &);
	void initializeEliminateAvailableExternallyLegacyPassPass(PassRegistry&);			void initializeEliminateAvailableExternallyLegacyPassPass(PassRegistry&);
	void initializeEntryExitInstrumenterPass(PassRegistry&);			void initializeEntryExitInstrumenterPass(PassRegistry&);
	void initializeExpandMemCmpPassPass(PassRegistry&);			void initializeExpandMemCmpPassPass(PassRegistry&);
	void initializeExpandPostRAPass(PassRegistry&);			void initializeExpandPostRAPass(PassRegistry&);
	void initializeExpandReductionsPass(PassRegistry&);			void initializeExpandReductionsPass(PassRegistry&);
	void initializeMakeGuardsExplicitLegacyPassPass(PassRegistry&);			void initializeMakeGuardsExplicitLegacyPassPass(PassRegistry &);
	void initializeExternalAAWrapperPassPass(PassRegistry&);			void initializeExternalAAWrapperPassPass(PassRegistry &);
	void initializeFEntryInserterPass(PassRegistry&);			void initializeFEntryInserterPass(PassRegistry &);
				void initializeFSProfileLoaderPassPass(PassRegistry &);
	void initializeFinalizeISelPass(PassRegistry&);			void initializeFinalizeISelPass(PassRegistry &);
	void initializeFinalizeMachineBundlesPass(PassRegistry&);			void initializeFinalizeMachineBundlesPass(PassRegistry &);
	void initializeFixIrreduciblePass(PassRegistry &);			void initializeFixIrreduciblePass(PassRegistry &);
	void initializeFixupStatepointCallerSavedPass(PassRegistry&);			void initializeFixupStatepointCallerSavedPass(PassRegistry&);
	void initializeFlattenCFGPassPass(PassRegistry&);			void initializeFlattenCFGPassPass(PassRegistry&);
	void initializeFloat2IntLegacyPassPass(PassRegistry&);			void initializeFloat2IntLegacyPassPass(PassRegistry&);
	void initializeForceFunctionAttrsLegacyPassPass(PassRegistry&);			void initializeForceFunctionAttrsLegacyPassPass(PassRegistry&);
	void initializeForwardControlFlowIntegrityPass(PassRegistry&);			void initializeForwardControlFlowIntegrityPass(PassRegistry&);
	void initializeFuncletLayoutPass(PassRegistry&);			void initializeFuncletLayoutPass(PassRegistry&);
	void initializeFunctionImportLegacyPassPass(PassRegistry&);			void initializeFunctionImportLegacyPassPass(PassRegistry&);
	▲ Show 20 Lines • Show All 287 Lines • Show Last 20 Lines

llvm/include/llvm/LTO/Config.h

Show First 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	struct Config {
bool TimeTraceEnabled = false;		bool TimeTraceEnabled = false;

/// Time trace granularity.		/// Time trace granularity.
unsigned TimeTraceGranularity = 500;		unsigned TimeTraceGranularity = 500;

bool ShouldDiscardValueNames = true;		bool ShouldDiscardValueNames = true;
DiagnosticHandlerFunction DiagHandler;		DiagnosticHandlerFunction DiagHandler;

		/// Add FS AFDO discriminator.
		bool AddFSDiscriminator = false;

/// If this field is set, LTO will write input file paths and symbol		/// If this field is set, LTO will write input file paths and symbol
/// resolutions here in llvm-lto2 command line flag format. This can be		/// resolutions here in llvm-lto2 command line flag format. This can be
/// used for testing and for running the LTO pipeline outside of the linker		/// used for testing and for running the LTO pipeline outside of the linker
/// with llvm-lto2.		/// with llvm-lto2.
std::unique_ptr<raw_ostream> ResolutionFile;		std::unique_ptr<raw_ostream> ResolutionFile;

/// Tunable parameters for passes in the default pipelines.		/// Tunable parameters for passes in the default pipelines.
PipelineTuningOptions PTO;		PipelineTuningOptions PTO;
▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

llvm/include/llvm/ProfileData/SampleProfReader.h

Show First 20 Lines • Show All 228 Lines • ▼ Show 20 Lines
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/ProfileSummary.h"		#include "llvm/IR/ProfileSummary.h"
#include "llvm/ProfileData/GCOV.h"		#include "llvm/ProfileData/GCOV.h"
#include "llvm/ProfileData/SampleProf.h"		#include "llvm/ProfileData/SampleProf.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorOr.h"		#include "llvm/Support/ErrorOr.h"
		#include "llvm/Support/FSAFDODiscriminator.h"
#include "llvm/Support/MemoryBuffer.h"		#include "llvm/Support/MemoryBuffer.h"
#include "llvm/Support/SymbolRemappingReader.h"		#include "llvm/Support/SymbolRemappingReader.h"
#include <algorithm>		#include <algorithm>
#include <cstdint>		#include <cstdint>
#include <memory>		#include <memory>
#include <string>		#include <string>
#include <system_error>		#include <system_error>
#include <vector>		#include <vector>
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
/// is useful for debugging and testing, while the binary format is more		/// is useful for debugging and testing, while the binary format is more
/// compact and I/O efficient. They can both be used interchangeably.		/// compact and I/O efficient. They can both be used interchangeably.
class SampleProfileReader {		class SampleProfileReader {
public:		public:
SampleProfileReader(std::unique_ptr<MemoryBuffer> B, LLVMContext &C,		SampleProfileReader(std::unique_ptr<MemoryBuffer> B, LLVMContext &C,
SampleProfileFormat Format = SPF_None)		SampleProfileFormat Format = SPF_None)
: Profiles(0), Ctx(C), Buffer(std::move(B)), Format(Format) {}		: Profiles(0), Ctx(C), Buffer(std::move(B)), Format(Format) {}

		void setDiscriminatorMaskedBitFrom(uint32_t B) { MaskedBitFrom = B; }

		inline uint32_t getDiscriminatorMask() const {
		assert((MaskedBitFrom != 0) && "MaskedBitFrom is not set properly");
		return getN1Bits(MaskedBitFrom);
		}

virtual ~SampleProfileReader() = default;		virtual ~SampleProfileReader() = default;

/// Read and validate the file header.		/// Read and validate the file header.
virtual std::error_code readHeader() = 0;		virtual std::error_code readHeader() = 0;

/// The interface to read sample profiles from the associated file.		/// The interface to read sample profiles from the associated file.
std::error_code read() {		std::error_code read() {
if (std::error_code EC = readImpl())		if (std::error_code EC = readImpl())
▲ Show 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	protected:

/// \brief The format of sample.		/// \brief The format of sample.
SampleProfileFormat Format = SPF_None;		SampleProfileFormat Format = SPF_None;

/// \brief The current module being compiled if SampleProfileReader		/// \brief The current module being compiled if SampleProfileReader
/// is used by compiler. If SampleProfileReader is used by other		/// is used by compiler. If SampleProfileReader is used by other
/// tools which are not compiler, M is usually nullptr.		/// tools which are not compiler, M is usually nullptr.
const Module *M = nullptr;		const Module *M = nullptr;

		/// The samples in this class masked the discriminator bit ending
		/// with bit MaskedBitFrom (0 based). Default should be for the base.
		unsigned MaskedBitFrom = 31;
};		};

class SampleProfileReaderText : public SampleProfileReader {		class SampleProfileReaderText : public SampleProfileReader {
public:		public:
SampleProfileReaderText(std::unique_ptr<MemoryBuffer> B, LLVMContext &C)		SampleProfileReaderText(std::unique_ptr<MemoryBuffer> B, LLVMContext &C)
: SampleProfileReader(std::move(B), C, SPF_Text) {}		: SampleProfileReader(std::move(B), C, SPF_Text) {}

/// Read and validate the file header.		/// Read and validate the file header.
▲ Show 20 Lines • Show All 310 Lines • Show Last 20 Lines

llvm/include/llvm/Support/FSAFDODiscriminator.h

This file was added.

				//===- llvm/Support/FSAFDODiscriminator.h ------------------------ C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines the bits to be used by variois FSAFDO passes.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_SUPPORT_FSAFDODISCRIMINATOR_H
				#define LLVM_SUPPORT_FSAFDODISCRIMINATOR_H

				#define BASE_DIS_BIT_BEG 0
				#define BASE_DIS_BIT_END 7

				#define PASS_1_DIS_BIT_BEG 8
				snehasishUnsubmitted Not Done Reply Inline Actions A few questions about the discriminator bits: Depending on the transformation in the target pass the requirement of bits may be different, i.e. 5 bits for each may be too many or too few. Do you have any data to share about how many bits are used by each? How do we alert authors of new target optimizations (or code refactoring) additional discriminator bits are needed to disambiguate? Would a late stage analysis only pass which enumerates different instructions with the same debug+discriminator info be useful to commit? If I understand correctly, we bump the bit for each level of cloning. This seems to be a less efficient coding scheme, max 5 bits where by enumeration you could identify 31 clones? Have you considered other coding schemes? snehasish: A few questions about the discriminator bits: * Depending on the transformation in the target…
				davidxlUnsubmitted Not Done Reply Inline Actions A few questions about the discriminator bits: Depending on the transformation in the target pass the requirement of bits may be different, i.e. 5 bits for each may be too many or too few. Do you have any data to share about how many bits are used by each? I assume most of the transformations produce few clones except for unrolling (which depends on unroll factor). How do we alert authors of new target optimizations (or code refactoring) additional discriminator bits are needed to disambiguate? Would a late stage analysis only pass which enumerates different instructions with the same debug+discriminator info be useful to commit? The problem with this is that the authors won't have any means to change anything. Rong and I discussed about this. Longer term when this becomes and issue, increasing the size of the discriminator container type will be the way to go. If I understand correctly, we bump the bit for each level of cloning. This seems to be a less efficient coding scheme, max 5 bits where by enumeration you could identify 31 clones? Have you considered other coding schemes? The biggest advantage of fixed width is simplicity, I think. davidxl: > A few questions about the discriminator bits: > > * Depending on the transformation in the…
				snehasishUnsubmitted Not Done Reply Inline Actions The biggest advantage of fixed width is simplicity, I think. Not sure if I made the point clearly, so repeating - I'm trying to distinguish between using 5 values (one per bit) in a pass vs 2^5-1 values which can be enumerated by 5 bits. For example, with a loop unroll factor of >5, this coding scheme will not be able to assign unique ids for all clones. Is this understanding correct? I think redistributing the bits based on the pass transformations should be sufficient to avoid a more complex coding scheme. I agree using fixed width bits makes for a simple implementation and we should prefer this unless data shows otherwise. snehasish: > The biggest advantage of fixed width is simplicity, I think. Not sure if I made the point…
				davidxlUnsubmitted Not Done Reply Inline Actions Rong can answer this question more thoroughly. By just looking at line 373 of FlowSensitiveSampleProfile.cpp file, it seems it is actually sequentially increasing FSD, instead of bumping the bit? davidxl: Rong can answer this question more thoroughly. By just looking at line 373 of…
				snehasishUnsubmitted Not Done Reply Inline Actions Thanks for the pointer and Rong clarified the usage. My understanding was incorrect and the encoding space is not as limited as I thought. Since there is no measurable performance difference the current approach seems simplest. snehasish: Thanks for the pointer and Rong clarified the usage. My understanding was incorrect and the…
				#define PASS_1_DIS_BIT_END 13

				#define PASS_2_DIS_BIT_BEG 14
				#define PASS_2_DIS_BIT_END 19

				#define PASS_3_DIS_BIT_BEG 20
				#define PASS_3_DIS_BIT_END 25

				#define PASS_LAST_DIS_BIT_BEG 26
				#define PASS_LAST_DIS_BIT_END 31

				// Set bit 0 .. n to 1.
				static inline unsigned getN1Bits(int N) {
				if (N >= 31)
				return 0xFFFFFFFF;
				return (1 << (N + 1)) - 1;
				}

				// Given a discriminator n, return the number of bucket it's in.
				inline static unsigned getFSBucket(unsigned int DiscriminatorVal) {
				unsigned int N = DiscriminatorVal;
				if (N == 0)
				return 0;
				if (N &
				(getN1Bits(PASS_LAST_DIS_BIT_BEG - 1) ^ getN1Bits(PASS_LAST_DIS_BIT_END)))
				return 5;
				if (N & (getN1Bits(PASS_3_DIS_BIT_BEG - 1) ^ getN1Bits(PASS_3_DIS_BIT_END)))
				return 4;
				if (N & (getN1Bits(PASS_2_DIS_BIT_BEG - 1) ^ getN1Bits(PASS_2_DIS_BIT_END)))
				return 3;
				if (N & (getN1Bits(PASS_1_DIS_BIT_BEG - 1) ^ getN1Bits(PASS_1_DIS_BIT_END)))
				return 2;
				return 1;
				}

				inline unsigned getFSBucketVal(int LowBit, int HighBit, unsigned N) {
				unsigned int V = N & getN1Bits(HighBit - LowBit);
				return (V >> LowBit);
				}

				inline unsigned getFSBucketVal(int B, unsigned N) {
				switch (B) {
				case 1:
				return getFSBucketVal(BASE_DIS_BIT_BEG, BASE_DIS_BIT_END, N);
				case 2:
				return getFSBucketVal(PASS_1_DIS_BIT_BEG, PASS_1_DIS_BIT_END, N);
				case 3:
				return getFSBucketVal(PASS_2_DIS_BIT_BEG, PASS_2_DIS_BIT_END, N);
				case 4:
				return getFSBucketVal(PASS_3_DIS_BIT_BEG, PASS_3_DIS_BIT_END, N);
				case 5:
				return getFSBucketVal(PASS_LAST_DIS_BIT_BEG, PASS_LAST_DIS_BIT_END, N);
				default:
				llvm_unreachable("Wrong FSBucket Number");
				Lint: Pre-merge checks Inline Actions clang-tidy: error: use of undeclared identifier 'llvm_unreachable' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: use of undeclared identifier 'llvm_unreachable' [clang-diagnostic-error]…
				}
				}

				inline void setFSBucketVal(int LowBit, int HighBit, unsigned Val, unsigned &N) {
				unsigned int V = Val & getN1Bits(HighBit - LowBit);
				V = V << LowBit;
				N \|= V;
				}

				inline void setFSBucketVal(int B, unsigned Val, unsigned &N) {
				switch (B) {
				case 1:
				return setFSBucketVal(BASE_DIS_BIT_BEG, BASE_DIS_BIT_END, Val, N);
				case 2:
				return setFSBucketVal(PASS_1_DIS_BIT_BEG, PASS_1_DIS_BIT_END, Val, N);
				case 3:
				return setFSBucketVal(PASS_2_DIS_BIT_BEG, PASS_2_DIS_BIT_END, Val, N);
				case 4:
				return setFSBucketVal(PASS_3_DIS_BIT_BEG, PASS_3_DIS_BIT_END, Val, N);
				case 5:
				return setFSBucketVal(PASS_LAST_DIS_BIT_BEG, PASS_LAST_DIS_BIT_END, Val, N);
				default:
				llvm_unreachable("Wrong FSBucket Number");
				Lint: Pre-merge checks Inline Actions clang-tidy: error: use of undeclared identifier 'llvm_unreachable' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: use of undeclared identifier 'llvm_unreachable' [clang-diagnostic-error]…
				}
				}

				#endif /* LLVM_SUPPORT_FSAFDODISCRIMINATOR_H */

llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines

template <typename BlockT> struct IRTraits;		template <typename BlockT> struct IRTraits;
template <> struct IRTraits<BasicBlock> {		template <> struct IRTraits<BasicBlock> {
using InstructionT = Instruction;		using InstructionT = Instruction;
using BasicBlockT = BasicBlock;		using BasicBlockT = BasicBlock;
using FunctionT = Function;		using FunctionT = Function;
using BlockFrequencyInfoT = BlockFrequencyInfo;		using BlockFrequencyInfoT = BlockFrequencyInfo;
using LoopT = Loop;		using LoopT = Loop;
using LoopInfoT = LoopInfo;		using LoopInfoPtrT = std::unique_ptr<LoopInfo>;
		using DominatorTreePtrT = std::unique_ptr<DominatorTree>;
		using PostDominatorTreeT = PostDominatorTree;
		using PostDominatorTreePtrT = std::unique_ptr<PostDominatorTree>;
using OptRemarkEmitterT = OptimizationRemarkEmitter;		using OptRemarkEmitterT = OptimizationRemarkEmitter;
using OptRemarkAnalysisT = OptimizationRemarkAnalysis;		using OptRemarkAnalysisT = OptimizationRemarkAnalysis;
using DominatorTreeT = DominatorTree;
using PostDominatorTreeT = PostDominatorTree;
static Function &getFunction(Function &F) { return F; }		static Function &getFunction(Function &F) { return F; }
static const BasicBlock getEntryBB(const Function F) {		static const BasicBlock getEntryBB(const Function F) {
return &F->getEntryBlock();		return &F->getEntryBlock();
}		}
};		};

} // end namespace afdo_detail		} // end namespace afdo_detail

extern cl::opt<unsigned> SampleProfileMaxPropagateIterations;		extern cl::opt<unsigned> SampleProfileMaxPropagateIterations;
extern cl::opt<unsigned> SampleProfileRecordCoverage;		extern cl::opt<unsigned> SampleProfileRecordCoverage;
extern cl::opt<unsigned> SampleProfileSampleCoverage;		extern cl::opt<unsigned> SampleProfileSampleCoverage;
extern cl::opt<bool> NoWarnSampleUnused;		extern cl::opt<bool> NoWarnSampleUnused;

template <typename BT> class SampleProfileLoaderBaseImpl {		template <typename BT> class SampleProfileLoaderBaseImpl {
public:		public:
SampleProfileLoaderBaseImpl(std::string Name) : Filename(Name) {}		SampleProfileLoaderBaseImpl(std::string Name) : Filename(Name) {}
void dump() { Reader->dump(); }		void dump() { Reader->dump(); }

using InstructionT = typename afdo_detail::IRTraits<BT>::InstructionT;		using InstructionT = typename afdo_detail::IRTraits<BT>::InstructionT;
using BasicBlockT = typename afdo_detail::IRTraits<BT>::BasicBlockT;		using BasicBlockT = typename afdo_detail::IRTraits<BT>::BasicBlockT;
using BlockFrequencyInfoT =		using BlockFrequencyInfoT =
typename afdo_detail::IRTraits<BT>::BlockFrequencyInfoT;		typename afdo_detail::IRTraits<BT>::BlockFrequencyInfoT;
using FunctionT = typename afdo_detail::IRTraits<BT>::FunctionT;		using FunctionT = typename afdo_detail::IRTraits<BT>::FunctionT;
using LoopT = typename afdo_detail::IRTraits<BT>::LoopT;		using LoopT = typename afdo_detail::IRTraits<BT>::LoopT;
using LoopInfoT = typename afdo_detail::IRTraits<BT>::LoopInfoT;		using LoopInfoPtrT = typename afdo_detail::IRTraits<BT>::LoopInfoPtrT;
		using DominatorTreePtrT =
		typename afdo_detail::IRTraits<BT>::DominatorTreePtrT;
		using PostDominatorTreePtrT =
		typename afdo_detail::IRTraits<BT>::PostDominatorTreePtrT;
		using PostDominatorTreeT =
		typename afdo_detail::IRTraits<BT>::PostDominatorTreeT;
using OptRemarkEmitterT =		using OptRemarkEmitterT =
typename afdo_detail::IRTraits<BT>::OptRemarkEmitterT;		typename afdo_detail::IRTraits<BT>::OptRemarkEmitterT;
using OptRemarkAnalysisT =		using OptRemarkAnalysisT =
typename afdo_detail::IRTraits<BT>::OptRemarkAnalysisT;		typename afdo_detail::IRTraits<BT>::OptRemarkAnalysisT;
using DominatorTreeT = typename afdo_detail::IRTraits<BT>::DominatorTreeT;
using PostDominatorTreeT =
typename afdo_detail::IRTraits<BT>::PostDominatorTreeT;

using BlockWeightMap = DenseMap<const BasicBlockT *, uint64_t>;		using BlockWeightMap = DenseMap<const BasicBlockT *, uint64_t>;
using EquivalenceClassMap =		using EquivalenceClassMap =
DenseMap<const BasicBlockT , const BasicBlockT >;		DenseMap<const BasicBlockT , const BasicBlockT >;
using Edge = std::pair<const BasicBlockT , const BasicBlockT >;		using Edge = std::pair<const BasicBlockT , const BasicBlockT >;
using EdgeWeightMap = DenseMap<Edge, uint64_t>;		using EdgeWeightMap = DenseMap<Edge, uint64_t>;
using BlockEdgeMap =		using BlockEdgeMap =
DenseMap<const BasicBlockT , SmallVector<const BasicBlockT , 8>>;		DenseMap<const BasicBlockT , SmallVector<const BasicBlockT , 8>>;
Show All 20 Lines	protected:
void printEdgeWeight(raw_ostream &OS, Edge E);		void printEdgeWeight(raw_ostream &OS, Edge E);
void printBlockWeight(raw_ostream &OS, const BasicBlockT *BB) const;		void printBlockWeight(raw_ostream &OS, const BasicBlockT *BB) const;
void printBlockEquivalence(raw_ostream &OS, const BasicBlockT *BB);		void printBlockEquivalence(raw_ostream &OS, const BasicBlockT *BB);
bool computeBlockWeights(FunctionT &F);		bool computeBlockWeights(FunctionT &F);
void findEquivalenceClasses(FunctionT &F);		void findEquivalenceClasses(FunctionT &F);
void findEquivalencesFor(BasicBlockT *BB1,		void findEquivalencesFor(BasicBlockT *BB1,
ArrayRef<BasicBlockT *> Descendants,		ArrayRef<BasicBlockT *> Descendants,
PostDominatorTreeT *DomTree);		PostDominatorTreeT *DomTree);

void propagateWeights(FunctionT &F);		void propagateWeights(FunctionT &F);
uint64_t visitEdge(Edge E, unsigned NumUnknownEdges, Edge UnknownEdge);		uint64_t visitEdge(Edge E, unsigned NumUnknownEdges, Edge UnknownEdge);
void buildEdges(FunctionT &F);		void buildEdges(FunctionT &F);
bool propagateThroughEdges(FunctionT &F, bool UpdateBlockCount);		bool propagateThroughEdges(FunctionT &F, bool UpdateBlockCount);
void clearFunctionData();		void clearFunctionData();
void computeDominanceAndLoopInfo(FunctionT &F);		void computeDominanceAndLoopInfo(FunctionT &F);
bool		bool
computeAndPropagateWeights(FunctionT &F,		computeAndPropagateWeights(FunctionT &F,
Show All 22 Lines	protected:
///		///
/// Two blocks BB1 and BB2 are in the same equivalence class if they		/// Two blocks BB1 and BB2 are in the same equivalence class if they
/// dominate and post-dominate each other, and they are in the same loop		/// dominate and post-dominate each other, and they are in the same loop
/// nest. When this happens, the two blocks are guaranteed to execute		/// nest. When this happens, the two blocks are guaranteed to execute
/// the same number of times.		/// the same number of times.
EquivalenceClassMap EquivalenceClass;		EquivalenceClassMap EquivalenceClass;

/// Dominance, post-dominance and loop information.		/// Dominance, post-dominance and loop information.
std::unique_ptr<DominatorTreeT> DT;		DominatorTreePtrT DT;
std::unique_ptr<PostDominatorTreeT> PDT;		PostDominatorTreePtrT PDT;
std::unique_ptr<LoopInfoT> LI;		LoopInfoPtrT LI;

/// Predecessors for each basic block in the CFG.		/// Predecessors for each basic block in the CFG.
BlockEdgeMap Predecessors;		BlockEdgeMap Predecessors;

/// Successors for each basic block in the CFG.		/// Successors for each basic block in the CFG.
BlockEdgeMap Successors;		BlockEdgeMap Successors;

/// Profile coverage tracker.		/// Profile coverage tracker.
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	if (!FS)
return std::error_code();		return std::error_code();

const DebugLoc &DLoc = Inst.getDebugLoc();		const DebugLoc &DLoc = Inst.getDebugLoc();
if (!DLoc)		if (!DLoc)
return std::error_code();		return std::error_code();

const DILocation *DIL = DLoc;		const DILocation *DIL = DLoc;
uint32_t LineOffset = FunctionSamples::getOffset(DIL);		uint32_t LineOffset = FunctionSamples::getOffset(DIL);
uint32_t Discriminator = DIL->getBaseDiscriminator();		uint32_t Discriminator;
		if (EnableFSDiscriminator)
		Discriminator = DIL->getDiscriminator();
		else
		Discriminator = DIL->getBaseDiscriminator();
ErrorOr<uint64_t> R = FS->findSamplesAt(LineOffset, Discriminator);		ErrorOr<uint64_t> R = FS->findSamplesAt(LineOffset, Discriminator);
if (R) {		if (R) {
bool FirstMark =		bool FirstMark =
CoverageTracker.markSamplesUsed(FS, LineOffset, Discriminator, R.get());		CoverageTracker.markSamplesUsed(FS, LineOffset, Discriminator, R.get());
if (FirstMark) {		if (FirstMark) {
ORE->emit([&]() {		ORE->emit([&]() {
OptRemarkAnalysisT Remark(DEBUG_TYPE, "AppliedSamples", &Inst);		OptRemarkAnalysisT Remark(DEBUG_TYPE, "AppliedSamples", &Inst);
Remark << "Applied " << ore::NV("NumSamples", *R);		Remark << "Applied " << ore::NV("NumSamples", *R);
Remark << " samples from profile (offset: ";		Remark << " samples from profile (offset: ";
Remark << ore::NV("LineOffset", LineOffset);		Remark << ore::NV("LineOffset", LineOffset);
if (Discriminator) {		if (Discriminator) {
Remark << ".";		Remark << ".";
Remark << ore::NV("Discriminator", Discriminator);		Remark << ore::NV("Discriminator", Discriminator);
}		}
Remark << ")";		Remark << ")";
return Remark;		return Remark;
});		});
}		}
LLVM_DEBUG(dbgs() << " " << DLoc.getLine() << "."		LLVM_DEBUG(dbgs() << " " << DLoc.getLine() << "." << Discriminator << ":"
<< DIL->getBaseDiscriminator() << ":" << Inst		<< Inst << " (line offset: " << LineOffset << "."
<< " (line offset: " << LineOffset << "."		<< Discriminator << " - weight: " << R.get() << ")\n");
<< DIL->getBaseDiscriminator() << " - weight: " << R.get()
<< ")\n");
}		}
return R;		return R;
}		}

/// Compute the weight of a basic block.		/// Compute the weight of a basic block.
///		///
/// The weight of basic block \p BB is the maximum weight of all the		/// The weight of basic block \p BB is the maximum weight of all the
/// instructions in BB.		/// instructions in BB.
▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	for (auto &BB : F) {
// 2- BB2 post-dominates BB1.		// 2- BB2 post-dominates BB1.
// 3- BB1 and BB2 are in the same loop nest.		// 3- BB1 and BB2 are in the same loop nest.
//		//
// If all those conditions hold, it means that BB2 is executed		// If all those conditions hold, it means that BB2 is executed
// as many times as BB1, so they are placed in the same equivalence		// as many times as BB1, so they are placed in the same equivalence
// class by making BB2's equivalence class be BB1.		// class by making BB2's equivalence class be BB1.
DominatedBBs.clear();		DominatedBBs.clear();
DT->getDescendants(BB1, DominatedBBs);		DT->getDescendants(BB1, DominatedBBs);
findEquivalencesFor(BB1, DominatedBBs, PDT.get());		findEquivalencesFor(BB1, DominatedBBs, &*PDT);

LLVM_DEBUG(printBlockEquivalence(dbgs(), BB1));		LLVM_DEBUG(printBlockEquivalence(dbgs(), BB1));
}		}

// Assign weights to equivalence classes.		// Assign weights to equivalence classes.
//		//
// All the basic blocks in the same equivalence class will execute		// All the basic blocks in the same equivalence class will execute
// the same number of times. Since we know that the head block in		// the same number of times. Since we know that the head block in
▲ Show 20 Lines • Show All 419 Lines • ▼ Show 20 Lines	Func.getContext().diagnose(DiagnosticInfoSampleProfile(
": Function profile not used",		": Function profile not used",
DS_Warning));		DS_Warning));
return 0;		return 0;
}		}

template <typename BT>		template <typename BT>
void SampleProfileLoaderBaseImpl<BT>::computeDominanceAndLoopInfo(		void SampleProfileLoaderBaseImpl<BT>::computeDominanceAndLoopInfo(
FunctionT &F) {		FunctionT &F) {
DT.reset(new DominatorTreeT);		DT.reset(new DominatorTree);
DT->recalculate(F);		DT->recalculate(F);

PDT.reset(new PostDominatorTree(F));		PDT.reset(new PostDominatorTree(F));

LI.reset(new LoopInfoT);		LI.reset(new LoopInfo);
LI->analyze(*DT);		LI->analyze(*DT);
}		}

#undef DEBUG_TYPE		#undef DEBUG_TYPE

} // namespace llvm		} // namespace llvm
#endif // LLVM_TRANSFORMS_UTILS_SAMPLEPROFILELOADERBASEIMPL_H		#endif // LLVM_TRANSFORMS_UTILS_SAMPLEPROFILELOADERBASEIMPL_H

llvm/lib/CodeGen/CMakeLists.txt

Show All 27 Lines	add_llvm_component_library(LLVMCodeGen
ExecutionDomainFix.cpp		ExecutionDomainFix.cpp
ExpandMemCmp.cpp		ExpandMemCmp.cpp
ExpandPostRAPseudos.cpp		ExpandPostRAPseudos.cpp
ExpandReductions.cpp		ExpandReductions.cpp
FaultMaps.cpp		FaultMaps.cpp
FEntryInserter.cpp		FEntryInserter.cpp
FinalizeISel.cpp		FinalizeISel.cpp
FixupStatepointCallerSaved.cpp		FixupStatepointCallerSaved.cpp
		FlowSensitiveSampleProfile.cpp
FuncletLayout.cpp		FuncletLayout.cpp
GCMetadata.cpp		GCMetadata.cpp
GCMetadataPrinter.cpp		GCMetadataPrinter.cpp
GCRootLowering.cpp		GCRootLowering.cpp
GCStrategy.cpp		GCStrategy.cpp
GlobalMerge.cpp		GlobalMerge.cpp
HardwareLoops.cpp		HardwareLoops.cpp
IfConversion.cpp		IfConversion.cpp
▲ Show 20 Lines • Show All 178 Lines • Show Last 20 Lines

llvm/lib/CodeGen/FlowSensitiveSampleProfile.cpp

This file was added.

				//===-------- FlowSensitiveSampleProfile.cpp: Flow Sensitive SampleFDO-----===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file provides the implementation of the flow sensitive SampleFDO.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/CodeGen/FlowSensitiveSampleProfile.h"
				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/DenseSet.h"
				#include "llvm/Analysis/BlockFrequencyInfoImpl.h"
				#include "llvm/IR/Function.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h"
				#include "llvm/Transforms/Utils/SampleProfileLoaderBaseUtil.h"

				using namespace llvm;
				using namespace sampleprof;
				using namespace llvm::sampleprofutil;
				using ProfileCount = Function::ProfileCount;

				#define DEBUG_TYPE "fs"
				#define DEBUG_TYPE_DISCRIMINATOR DEBUG_TYPE "-discriminators"
				#define DEBUG_TYPE_LOADER DEBUG_TYPE "-profile-loader"

				extern cl::opt<bool> EnableFSDiscriminator;
				static cl::opt<bool>
				DisableFSProfileLoader("disable-fs-profile-loader", cl::Hidden,
				cl::init(false),
				cl::desc("Disable flow senstive profile loading"));
				static cl::opt<bool> EnableFSBranchProb(
				"enable-fs-branchprob", cl::Hidden, cl::init(true),
				cl::desc("Enable seting flow senstive branch probabilities"));

				static cl::opt<unsigned> FSProfileDebugProbDiffThreshold(
				"fs-profile-debug-prob-diff-threshold", cl::init(10),
				cl::desc("Only show debug message if the branch probility is greater than "
				"this value (in percentage)."));

				static cl::opt<unsigned> FSProfileDebugBWThreshold(
				"fs-profile-debug-bw-threshold", cl::init(10000),
				cl::desc("Only show debug message if the source branch weight is greater "
				" than this value."));

				static cl::opt<bool> ViewBFIBefore("fs-viewbfi-before", cl::Hidden,
				cl::init(false),
				cl::desc("View BFI before FS loader"));
				static cl::opt<bool> ViewBFIAfter("fs-viewbfi-after", cl::Hidden,
				cl::init(false),
				cl::desc("View BFI after FS loader"));

				// Internal option used to control BFI display only after MBP pass.
				// Defined in CodeGen/MachineBlockFrequencyInfo.cpp:
				// -view-block-layout-with-bfi={none \| fraction \| integer \| count}
				extern cl::opt<GVDAGType> ViewBlockLayoutWithBFI;

				// Command line option to specify the name of the function for CFG dump
				// Defined in Analysis/BlockFrequencyInfo.cpp: -view-bfi-func-name=
				extern cl::opt<std::string> ViewBlockFreqFuncName;

				char AddFSDiscriminators::ID = 0;
				char FSProfileLoaderPass::ID = 0;

				INITIALIZE_PASS(AddFSDiscriminators, DEBUG_TYPE_DISCRIMINATOR,
				"Add Flow Sensitive Discriminators",
				/* cfg = / false, / is_analysis = */ false)

				INITIALIZE_PASS_BEGIN(FSProfileLoaderPass, DEBUG_TYPE_LOADER,
				"Load Flow Sensitive Profile",
				/* cfg = / false, / is_analysis = */ false)
				INITIALIZE_PASS_DEPENDENCY(MachineBlockFrequencyInfo)
				INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
				INITIALIZE_PASS_DEPENDENCY(MachinePostDominatorTree)
				INITIALIZE_PASS_DEPENDENCY(MachineLoopInfo)
				INITIALIZE_PASS_DEPENDENCY(MachineOptimizationRemarkEmitterPass)
				INITIALIZE_PASS_END(FSProfileLoaderPass, DEBUG_TYPE_LOADER,
				"Load Flow Sensitive Profile",
				/* cfg = / false, / is_analysis = */ false)

				char &llvm::AddFSDiscriminatorsID = AddFSDiscriminators::ID;
				char &llvm::FSProfileLoaderPassID = FSProfileLoaderPass::ID;

				FunctionPass *llvm::createAddFSDiscriminatorsPass(unsigned LowBit,
				unsigned HighBit) {
				return new AddFSDiscriminators(LowBit, HighBit);
				}
				FunctionPass *llvm::createFSProfileLoaderPass(std::string File, unsigned LowBit,
				unsigned HighBit) {
				return new FSProfileLoaderPass(File, LowBit, HighBit);
				}

				static uint64_t getCallStackHash(const MachineBasicBlock &BB,
				const MachineInstr &MI,
				const DILocation *DIL) {
				uint64_t Ret = MD5Hash(std::to_string(DIL->getLine()));
				Ret ^= MD5Hash(BB.getName());
				Ret ^= MD5Hash(DIL->getScope()->getSubprogram()->getLinkageName());
				for (DIL = DIL->getInlinedAt(); DIL; DIL = DIL->getInlinedAt()) {
				Ret ^= MD5Hash(std::to_string(DIL->getLine()));
				Ret ^= MD5Hash(DIL->getScope()->getSubprogram()->getLinkageName());
				}
				return Ret;
				}

				namespace llvm {

				namespace afdo_detail {
				template <> struct IRTraits<MachineBasicBlock> {
				using InstructionT = MachineInstr;
				using BasicBlockT = MachineBasicBlock;
				using FunctionT = MachineFunction;
				using BlockFrequencyInfoT = MachineBlockFrequencyInfo;
				using LoopT = MachineLoop;
				using LoopInfoPtrT = MachineLoopInfo *;
				using DominatorTreePtrT = MachineDominatorTree *;
				using PostDominatorTreePtrT = MachinePostDominatorTree *;
				using PostDominatorTreeT = MachinePostDominatorTree;
				using OptRemarkEmitterT = MachineOptimizationRemarkEmitter;
				using OptRemarkAnalysisT = MachineOptimizationRemarkAnalysis;
				static Function &getFunction(MachineFunction &F) { return F.getFunction(); }
				static const MachineBasicBlock getEntryBB(const MachineFunction F) {
				return GraphTraits<const MachineFunction *>::getEntryNode(F);
				}
				};
				} // namespace afdo_detail

				class FSProfileLoader final
				: public SampleProfileLoaderBaseImpl<MachineBasicBlock> {
				public:
				void setInitVals(MachineDominatorTree MDT_, MachinePostDominatorTree MPDT_,
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'MDT_' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'MPDT_' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'MDT_' [readability-identifier-naming]…
				MachineLoopInfo MLI_, MachineBlockFrequencyInfo MBFI_,
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'MLI_' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'MBFI_' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'MLI_' [readability-identifier-naming]…
				MachineOptimizationRemarkEmitter *ORE_) {
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'ORE_' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'ORE_' [readability-identifier-naming]…
				DT = MDT_;
				PDT = MPDT_;
				LI = MLI_;
				MBFI = MBFI_;
				ORE = ORE_;
				}
				void setMaskBitVals(unsigned LowBit_, unsigned HighBit_) {
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'LowBit_' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'HighBit_' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'LowBit_' [readability-identifier-naming]…
				LowBit = LowBit_;
				HighBit = HighBit_;
				}

				FSProfileLoader(StringRef Name)
				: SampleProfileLoaderBaseImpl(std::string(Name)) {}

				void setBranchProbs(MachineFunction &F);
				bool runOnFunction(MachineFunction &F);
				bool doInitialization(Module &M);
				bool isValid() const { return ProfileIsValid; }

				protected:
				friend class SampleCoverageTracker;

				/// Hold the information of the basic block frequency.
				MachineBlockFrequencyInfo *MBFI;

				// LowBit in the FS discriminator used by this instance. Note the number is
				// 0-based. Base discrimnator use bit 0 to bit 11.
				unsigned LowBit;
				// HighwBit in the FS discriminator used by this instance. Note the number
				// is 0-based.
				unsigned HighBit;

				bool ProfileIsValid = true;
				};

				template <>
				void SampleProfileLoaderBaseImpl<
				MachineBasicBlock>::computeDominanceAndLoopInfo(MachineFunction &F) {}

				/// Build in/out edge lists for each basic block in the CFG.
				///
				/// We are interested in unique edges. If a block B1 has multiple
				/// edges to another block B2, we only add a single B1->B2 edge.
				template <>
				void SampleProfileLoaderBaseImpl<MachineBasicBlock>::buildEdges(FunctionT &F) {
				for (auto &BI : F) {
				BasicBlockT *B1 = &BI;

				// Add predecessors for B1.
				SmallPtrSet<BasicBlockT *, 16> Visited;
				if (!Predecessors[B1].empty())
				llvm_unreachable("Found a stale predecessors list in a basic block.");
				for (auto *B2 : B1->predecessors()) {
				if (Visited.insert(B2).second)
				Predecessors[B1].push_back(B2);
				}

				// Add successors for B1.
				Visited.clear();
				if (!Successors[B1].empty())
				llvm_unreachable("Found a stale successors list in a basic block.");
				for (auto *B2 : B1->successors()) {
				if (Visited.insert(B2).second)
				Successors[B1].push_back(B2);
				}
				}
				}

				void FSProfileLoader::setBranchProbs(MachineFunction &F) {
				LLVM_DEBUG(dbgs() << "\nPropagation complete. Setting branch probs\n");
				for (auto &BI : F) {
				MachineBasicBlock *BB = &BI;
				if (BB->succ_size() < 2)
				continue;
				const MachineBasicBlock *EC = EquivalenceClass[BB];
				uint64_t BBWeight = BlockWeights[EC];
				uint64_t SumEdgeWeight = 0;
				for (MachineBasicBlock::succ_iterator SI = BB->succ_begin(),
				SE = BB->succ_end();
				SI != SE; ++SI) {
				MachineBasicBlock Succ = SI;
				Edge E = std::make_pair(BB, Succ);
				SumEdgeWeight += EdgeWeights[E];
				}

				if (BBWeight != SumEdgeWeight) {
				LLVM_DEBUG(dbgs() << "BBweight is not equal to SumEdgeWeight: BBWWeight="
				<< BBWeight << " SumEdgeWeight= " << SumEdgeWeight
				<< "\n");
				BBWeight = SumEdgeWeight;
				}
				if (BBWeight == 0) {
				LLVM_DEBUG(dbgs() << "SKIPPED. All branch weights are zero.\n");
				continue;
				}

				uint64_t BBWeight_Orig = BBWeight;
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'BBWeight_Orig' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'BBWeight_Orig' [readability-identifier…
				uint32_t MaxWeight = std::numeric_limits<uint32_t>::max();
				uint32_t Factor = 1;
				if (BBWeight > MaxWeight) {
				Factor = BBWeight / MaxWeight + 1;
				BBWeight /= Factor;
				LLVM_DEBUG(dbgs() << "Scaling weights by " << Factor << "\n");
				}

				for (MachineBasicBlock::succ_iterator SI = BB->succ_begin(),
				SE = BB->succ_end();
				SI != SE; ++SI) {
				MachineBasicBlock Succ = SI;
				Edge E = std::make_pair(BB, Succ);
				uint64_t EdgeWeight = EdgeWeights[E];
				EdgeWeight /= Factor;

				assert(BBWeight >= EdgeWeight &&
				"BBweight is larger than EdgeWeight -- should not happen.\n");

				BranchProbability OldProb = MBFI->getMBPI()->getEdgeProbability(BB, SI);
				BranchProbability NewProb(EdgeWeight, BBWeight);
				if (OldProb == NewProb)
				continue;
				BB->setSuccProbability(SI, NewProb);
				bool Show = false;
				BranchProbability Diff;
				if (OldProb > NewProb)
				Diff = OldProb - NewProb;
				else
				Diff = NewProb - OldProb;
				Show = (Diff >=
				Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - Show = (Diff >= - BranchProbability(FSProfileDebugProbDiffThreshold, 100)); + Show = (Diff >= BranchProbability(FSProfileDebugProbDiffThreshold, 100)); Lint: Pre-merge checks: clang-format: please reformat the code ``` - Show = (Diff >=…
				BranchProbability(FSProfileDebugProbDiffThreshold, 100));
				Show &= (BBWeight_Orig >= FSProfileDebugBWThreshold);

				auto DIL = BB->findBranchDebugLoc();
				auto SuccDIL = Succ->findBranchDebugLoc();
				if (Show) {
				dbgs() << "Set branch fs prob: MBB (" << BB->getNumber() << " -> "
				<< Succ->getNumber() << "): ";
				if (DIL)
				dbgs() << DIL->getFilename() << ":" << DIL->getLine() << ":"
				<< DIL->getColumn();
				if (SuccDIL)
				dbgs() << "-->" << SuccDIL->getFilename() << ":" << SuccDIL->getLine()
				<< ":" << SuccDIL->getColumn();
				dbgs() << " W=" << BBWeight_Orig << " " << OldProb << " --> "
				<< NewProb << "\n";
				}
				}
				}
				}

				bool FSProfileLoader::doInitialization(Module &M) {
				auto &Ctx = M.getContext();

				auto ReaderOrErr = sampleprof::SampleProfileReader::create(Filename, Ctx, "");
				if (std::error_code EC = ReaderOrErr.getError()) {
				std::string Msg = "Could not open profile: " + EC.message();
				Ctx.diagnose(DiagnosticInfoSampleProfile(Filename, Msg));
				return false;
				}

				Reader = std::move(ReaderOrErr.get());
				Reader->setModule(&M);
				Reader->setDiscriminatorMaskedBitFrom(HighBit);
				ProfileIsValid = (Reader->read() == sampleprof_error::success);
				Reader->getSummary();

				return true;
				}

				bool FSProfileLoader::runOnFunction(FunctionT &F) {
				Function &Func = F.getFunction();
				Samples = Reader->getSamplesFor(Func);
				if (!Samples \|\| Samples->empty())
				return false;

				if (getFunctionLoc(F) == 0)
				return false;

				DenseSet<GlobalValue::GUID> InlinedGUIDs;
				bool Changed = computeAndPropagateWeights(F, InlinedGUIDs);

				// Set the new BPI, BFI.
				if (EnableFSBranchProb)
				setBranchProbs(F);

				return Changed;
				}

				} // namespace llvm

				bool AddFSDiscriminators::runOnMachineFunction(MachineFunction &mf) {
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'mf' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'mf' [readability-identifier-naming]…
				if (!EnableFSDiscriminator)
				return false;

				bool Changed = false;
				using Location = std::pair<StringRef, unsigned>;
				using LocationDiscriminator = std::pair<Location, unsigned>;
				using BBSet = DenseSet<const MachineBasicBlock *>;
				using LocationDiscriminatorBBMap = DenseMap<LocationDiscriminator, BBSet>;
				using LocationDiscriminatorCurrPassMap =
				DenseMap<LocationDiscriminator, unsigned>;

				MF = &mf;
				LocationDiscriminatorBBMap LDBM;
				LocationDiscriminatorCurrPassMap LDCM;

				// Mask of discrimnators before this pass.
				unsigned BitMaskBefore = (1 << LowBit) - 1;
				// Mask of discrimnators includeing this pass.
				unsigned BitMaskNow = (1 << (HighBit + 1)) - 1;
				// Mask of discrimnators for bits specific to this pass.
				unsigned BitMaskThisPass = BitMaskNow ^ BitMaskBefore;
				unsigned NumNewD = 0;

				LLVM_DEBUG(dbgs() << "AddFSDiscriminators working on Func: "
				<< MF->getFunction().getName() << "\n");
				for (MachineBasicBlock &BB : *MF) {
				for (MachineInstr &I : BB) {
				const DILocation *DIL = I.getDebugLoc().get();
				if (!DIL)
				continue;
				unsigned LineNo = DIL->getLine();
				if (LineNo == 0)
				continue;
				Location L = std::make_pair(DIL->getFilename(), LineNo);
				unsigned Discriminator = DIL->getDiscriminator();
				Discriminator &= BitMaskBefore;
				LocationDiscriminator LD = std::make_pair(L, Discriminator);
				auto &BBMap = LDBM[LD];
				auto R = BBMap.insert(&BB);
				if (BBMap.size() == 1)
				continue;
				unsigned DiscriminatorCurrPass;

				DiscriminatorCurrPass = R.second ? ++LDCM[LD] : LDCM[LD];
				DiscriminatorCurrPass = DiscriminatorCurrPass << LowBit;
				DiscriminatorCurrPass += getCallStackHash(BB, I, DIL);
				DiscriminatorCurrPass &= BitMaskThisPass;
				unsigned NewD = Discriminator \| DiscriminatorCurrPass;
				auto NewDIL = DIL->cloneWithDiscriminator(NewD);
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: 'auto NewDIL' can be declared as 'const auto NewDIL' [llvm-qualified-auto] not useful Lint: Pre-merge checks:* clang-tidy: warning: 'auto NewDIL' can be declared as 'const auto *NewDIL' [llvm-qualified…
				if (!NewDIL) {
				LLVM_DEBUG(dbgs() << "Could not encode discriminator: "
				<< DIL->getFilename() << ":" << DIL->getLine() << ":"
				<< DIL->getColumn() << ":" << Discriminator << " "
				<< I << "\n");
				} else {
				I.setDebugLoc(NewDIL);
				NumNewD++;
				LLVM_DEBUG(dbgs() << DIL->getFilename() << ":" << DIL->getLine() << ":"
				<< DIL->getColumn() << ": from " << Discriminator
				<< " -> " << NewD << " DC is " << NumNewD << "\n");
				}
				Changed = true;
				}
				}

				if (Changed) {
				LLVM_DEBUG(dbgs() << "Num of LDBB: " << LDBM.size()
				<< " Num of New D: " << NumNewD << "\n");
				}

				return Changed;
				}

				bool FSProfileLoaderPass::runOnMachineFunction(MachineFunction &mf) {
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'mf' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'mf' [readability-identifier-naming]…
				if (DisableFSProfileLoader)
				return false;
				if (!FSSampleLoader->isValid())
				return false;

				MF = &mf;
				LLVM_DEBUG(dbgs() << "FSProfileLoader pass working on Func: "
				<< MF->getFunction().getName() << "\n");
				MBFI = &getAnalysis<MachineBlockFrequencyInfo>();
				FSSampleLoader->setInitVals(
				&getAnalysis<MachineDominatorTree>(),
				&getAnalysis<MachinePostDominatorTree>(), &getAnalysis<MachineLoopInfo>(),
				MBFI, &getAnalysis<MachineOptimizationRemarkEmitterPass>().getORE());

				MF->RenumberBlocks();
				if (ViewBFIBefore && ViewBlockLayoutWithBFI != GVDT_None &&
				(ViewBlockFreqFuncName.empty() \|\|
				MF->getFunction().getName().equals(ViewBlockFreqFuncName))) {
				MBFI->view("FSP_b." + MF->getName(), false);
				}

				bool Changed = FSSampleLoader->runOnFunction(mf);

				if (ViewBFIAfter && ViewBlockLayoutWithBFI != GVDT_None &&
				(ViewBlockFreqFuncName.empty() \|\|
				MF->getFunction().getName().equals(ViewBlockFreqFuncName))) {
				MBFI->view("FSP_a." + MF->getName(), false);
				}

				return Changed;
				}

				bool FSProfileLoaderPass::doInitialization(Module &M) {
				if (DisableFSProfileLoader)
				return false;
				LLVM_DEBUG(dbgs() << "FSProfileLoader pass working on Module " << M.getName()
				<< "\n");

				FSSampleLoader->setMaskBitVals(LowBit, HighBit);
				return FSSampleLoader->doInitialization(M);
				}

				void FSProfileLoaderPass::getAnalysisUsage(AnalysisUsage &AU) const {
				AU.setPreservesAll();
				AU.addRequired<MachineBlockFrequencyInfo>();
				AU.addRequired<MachineDominatorTree>();
				AU.addRequired<MachinePostDominatorTree>();
				AU.addRequiredTransitive<MachineLoopInfo>();
				AU.addRequired<MachineOptimizationRemarkEmitterPass>();
				MachineFunctionPass::getAnalysisUsage(AU);
				}

llvm/lib/CodeGen/TargetPassConfig.cpp

Show All 34 Lines
#include "llvm/MC/MCAsmInfo.h"		#include "llvm/MC/MCAsmInfo.h"
#include "llvm/MC/MCTargetOptions.h"		#include "llvm/MC/MCTargetOptions.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/CodeGen.h"		#include "llvm/Support/CodeGen.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
		#include "llvm/Support/FSAFDODiscriminator.h"
#include "llvm/Support/SaveAndRestore.h"		#include "llvm/Support/SaveAndRestore.h"
#include "llvm/Support/Threading.h"		#include "llvm/Support/Threading.h"
#include "llvm/Target/CGPassBuilderOption.h"		#include "llvm/Target/CGPassBuilderOption.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils.h"		#include "llvm/Transforms/Utils.h"
#include "llvm/Transforms/Utils/SymbolRewriter.h"		#include "llvm/Transforms/Utils/SymbolRewriter.h"
#include <cassert>		#include <cassert>
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	static cl::opt<GlobalISelAbortMode> EnableGlobalISelAbort(
cl::desc("Enable abort calls when \"global\" instruction selection "		cl::desc("Enable abort calls when \"global\" instruction selection "
"fails to lower/select an instruction"),		"fails to lower/select an instruction"),
cl::values(		cl::values(
clEnumValN(GlobalISelAbortMode::Disable, "0", "Disable the abort"),		clEnumValN(GlobalISelAbortMode::Disable, "0", "Disable the abort"),
clEnumValN(GlobalISelAbortMode::Enable, "1", "Enable the abort"),		clEnumValN(GlobalISelAbortMode::Enable, "1", "Enable the abort"),
clEnumValN(GlobalISelAbortMode::DisableWithDiag, "2",		clEnumValN(GlobalISelAbortMode::DisableWithDiag, "2",
"Disable the abort but emit a diagnostic on failure")));		"Disable the abort but emit a diagnostic on failure")));

		extern cl::opt<std::string> FSProfileFile;
		extern cl::opt<bool> EnableFSDiscriminator;

		static cl::opt<bool> DisableFSP2("disable-fsafdo-p2", cl::init(true),
		cl::Hidden,
		cl::desc("Disable FS Sampleloader Pass2"));

		static cl::opt<bool>
		FSNoFinalDiscrim("fs-no-final_discrim", cl::Hidden,
		cl::desc("No final discrimnator in flow sensitive AFDO."));

// Temporary option to allow experimenting with MachineScheduler as a post-RA		// Temporary option to allow experimenting with MachineScheduler as a post-RA
// scheduler. Targets can "properly" enable this with		// scheduler. Targets can "properly" enable this with
// substitutePass(&PostRASchedulerID, &PostMachineSchedulerID).		// substitutePass(&PostRASchedulerID, &PostMachineSchedulerID).
// Targets can return true in targetSchedulesPostRAScheduling() and		// Targets can return true in targetSchedulesPostRAScheduling() and
// insert a PostRA scheduling pass wherever it wants.		// insert a PostRA scheduling pass wherever it wants.
static cl::opt<bool> MISchedPostRA(		static cl::opt<bool> MISchedPostRA(
"misched-postra", cl::Hidden,		"misched-postra", cl::Hidden,
cl::desc(		cl::desc(
▲ Show 20 Lines • Show All 981 Lines • ▼ Show 20 Lines	if (getOptLevel() != CodeGenOpt::None)
addBlockPlacement();		addBlockPlacement();

// Insert before XRay Instrumentation.		// Insert before XRay Instrumentation.
addPass(&FEntryInserterID);		addPass(&FEntryInserterID);

addPass(&XRayInstrumentationID);		addPass(&XRayInstrumentationID);
addPass(&PatchableFunctionID);		addPass(&PatchableFunctionID);

		if (EnableFSDiscriminator && !FSNoFinalDiscrim)
		addPass(createAddFSDiscriminatorsPass(PASS_LAST_DIS_BIT_BEG,
		davidxlUnsubmitted Not Done Reply Inline Actions Is it necessary for this pass? BranchFolding does not create new clones, but merge them, so discriminator subsections can be reused (even though after the branch folding, some of the discriminator in that section gets removed)? davidxl: Is it necessary for this pass? BranchFolding does not create new clones, but merge them, so…
		xurAuthorUnsubmitted Done Reply Inline Actions I had this is mainly for the tail duplication in block placement pass. Conceptually it for all the new clones in the pipeline. I have the statistics of the counter for different rounds of discriminators. We do have plenty of this. That said, more of the performance are from the round before block placement. If I disable this, the performance change is within the noise range (for the benchmark I used). xur: I had this is mainly for the tail duplication in block placement pass. Conceptually it for all…
		davidxlUnsubmitted Not Done Reply Inline Actions Can probably make use this discriminator range for other purposes (target optimizations) davidxl: Can probably make use this discriminator range for other purposes (target optimizations)
		PASS_LAST_DIS_BIT_END));
addPreEmitPass();		addPreEmitPass();

if (TM->Options.EnableIPRA)		if (TM->Options.EnableIPRA)
// Collect register usage information and produce a register mask of		// Collect register usage information and produce a register mask of
// clobbered registers, to be used to optimize call sites.		// clobbered registers, to be used to optimize call sites.
addPass(createRegUsageInfoCollector());		addPass(createRegUsageInfoCollector());

// FIXME: Some backends are incompatible with running the verifier after		// FIXME: Some backends are incompatible with running the verifier after
▲ Show 20 Lines • Show All 257 Lines • ▼ Show 20 Lines
}		}

/// Add standard GC passes.		/// Add standard GC passes.
bool TargetPassConfig::addGCPasses() {		bool TargetPassConfig::addGCPasses() {
addPass(&GCMachineCodeAnalysisID, false);		addPass(&GCMachineCodeAnalysisID, false);
return true;		return true;
}		}

		static std::string getFSProfileFile() {
		if (FSProfileFile.empty() \|\| !EnableFSDiscriminator)
		return std::string();
		dbgs() << "FSProfile is " << FSProfileFile.getValue() << "\n";
		return FSProfileFile.getValue();
		}

/// Add standard basic block placement passes.		/// Add standard basic block placement passes.
void TargetPassConfig::addBlockPlacement() {		void TargetPassConfig::addBlockPlacement() {
		if (EnableFSDiscriminator)
		addPass(
		createAddFSDiscriminatorsPass(PASS_1_DIS_BIT_BEG, PASS_1_DIS_BIT_END));
		std::string FSFile = getFSProfileFile();
		if (!FSFile.empty())
		addPass(createFSProfileLoaderPass(FSFile, PASS_1_DIS_BIT_BEG,
		PASS_1_DIS_BIT_END));
if (addPass(&MachineBlockPlacementID)) {		if (addPass(&MachineBlockPlacementID)) {
// Run a separate pass to collect block placement statistics.		// Run a separate pass to collect block placement statistics.
if (EnableBlockPlacementStats)		if (EnableBlockPlacementStats)
addPass(&MachineBlockPlacementStatsID);		addPass(&MachineBlockPlacementStatsID);
}		}
		if (EnableFSDiscriminator && !DisableFSP2) {
		addPass(
		createAddFSDiscriminatorsPass(PASS_2_DIS_BIT_BEG, PASS_2_DIS_BIT_END));
		std::string FSFile = getFSProfileFile();
		if (!FSFile.empty())
		davidxlUnsubmitted Not Done Reply Inline Actions I can see the importance of adding pass to add discriminator after MBP due to tail dup in MBP, but how important (performance wise) it is to load sample profile again for branch folding pass? davidxl: I can see the importance of adding pass to add discriminator after MBP due to tail dup in MBP…
		xurAuthorUnsubmitted Done Reply Inline Actions This is from my experiments. The default is off anyway. I will probably remove this. xur: This is from my experiments. The default is off anyway. I will probably remove this.
		davidxlUnsubmitted Not Done Reply Inline Actions If it is not important for performance, suggest making it (the sample profile loading part) off by default to avoid unnecessary compile time increase. davidxl: If it is not important for performance, suggest making it (the sample profile loading part)…
		addPass(createFSProfileLoaderPass(FSFile, PASS_2_DIS_BIT_BEG,
		PASS_2_DIS_BIT_END));
		addPass(&BranchFolderPassID);
		}
}		}

//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
/// GlobalISel Configuration		/// GlobalISel Configuration
//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
bool TargetPassConfig::isGlobalISelAbortEnabled() const {		bool TargetPassConfig::isGlobalISelAbortEnabled() const {
return TM->Options.GlobalISelAbort == GlobalISelAbortMode::Enable;		return TM->Options.GlobalISelAbort == GlobalISelAbortMode::Enable;
}		}
Show All 12 Lines

llvm/lib/LTO/LTOBackend.cpp

Show First 20 Lines • Show All 208 Lines • ▼ Show 20 Lines	static void runNewPMPasses(const Config &Conf, Module &Mod, TargetMachine *TM,
ModuleSummaryIndex *ExportSummary,		ModuleSummaryIndex *ExportSummary,
const ModuleSummaryIndex *ImportSummary) {		const ModuleSummaryIndex *ImportSummary) {
Optional<PGOOptions> PGOOpt;		Optional<PGOOptions> PGOOpt;
if (!Conf.SampleProfile.empty())		if (!Conf.SampleProfile.empty())
PGOOpt = PGOOptions(Conf.SampleProfile, "", Conf.ProfileRemapping,		PGOOpt = PGOOptions(Conf.SampleProfile, "", Conf.ProfileRemapping,
PGOOptions::SampleUse, PGOOptions::NoCSAction, true);		PGOOptions::SampleUse, PGOOptions::NoCSAction, true);
else if (Conf.RunCSIRInstr) {		else if (Conf.RunCSIRInstr) {
PGOOpt = PGOOptions("", Conf.CSIRProfile, Conf.ProfileRemapping,		PGOOpt = PGOOptions("", Conf.CSIRProfile, Conf.ProfileRemapping,
PGOOptions::IRUse, PGOOptions::CSIRInstr);		PGOOptions::IRUse, PGOOptions::CSIRInstr,
		Conf.AddFSDiscriminator);
} else if (!Conf.CSIRProfile.empty()) {		} else if (!Conf.CSIRProfile.empty()) {
PGOOpt = PGOOptions(Conf.CSIRProfile, "", Conf.ProfileRemapping,		PGOOpt = PGOOptions(Conf.CSIRProfile, "", Conf.ProfileRemapping,
PGOOptions::IRUse, PGOOptions::CSIRUse);		PGOOptions::IRUse, PGOOptions::CSIRUse,
		Conf.AddFSDiscriminator);
		} else if (Conf.AddFSDiscriminator) {
		PGOOpt = PGOOptions("", "", "", PGOOptions::NoAction,
		PGOOptions::NoCSAction, true);
}		}

PassInstrumentationCallbacks PIC;		PassInstrumentationCallbacks PIC;
StandardInstrumentations SI(Conf.DebugPassManager);		StandardInstrumentations SI(Conf.DebugPassManager);
SI.registerCallbacks(PIC);		SI.registerCallbacks(PIC);
PassBuilder PB(Conf.DebugPassManager, TM, Conf.PTO, PGOOpt, &PIC);		PassBuilder PB(Conf.DebugPassManager, TM, Conf.PTO, PGOOpt, &PIC);

RegisterPassPlugins(Conf.PassPlugins, PB);		RegisterPassPlugins(Conf.PassPlugins, PB);
▲ Show 20 Lines • Show All 475 Lines • Show Last 20 Lines

llvm/lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 301 Lines • ▼ Show 20 Lines
extern cl::opt<bool> EnableLoopInterchange;		extern cl::opt<bool> EnableLoopInterchange;
extern cl::opt<bool> EnableUnrollAndJam;		extern cl::opt<bool> EnableUnrollAndJam;
extern cl::opt<bool> EnableLoopFlatten;		extern cl::opt<bool> EnableLoopFlatten;
extern cl::opt<bool> RunNewGVN;		extern cl::opt<bool> RunNewGVN;
extern cl::opt<bool> RunPartialInlining;		extern cl::opt<bool> RunPartialInlining;

extern cl::opt<bool> FlattenedProfileUsed;		extern cl::opt<bool> FlattenedProfileUsed;

		extern cl::opt<std::string> FSProfileFile;
extern cl::opt<AttributorRunOption> AttributorRun;		extern cl::opt<AttributorRunOption> AttributorRun;
extern cl::opt<bool> EnableKnowledgeRetention;		extern cl::opt<bool> EnableKnowledgeRetention;

extern cl::opt<bool> EnableMatrix;		extern cl::opt<bool> EnableMatrix;

extern cl::opt<bool> DisablePreInliner;		extern cl::opt<bool> DisablePreInliner;
extern cl::opt<int> PreInlineThreshold;		extern cl::opt<int> PreInlineThreshold;

▲ Show 20 Lines • Show All 750 Lines • ▼ Show 20 Lines	PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
// In SamplePGO ThinLTO backend, we need instcombine before profile annotation		// In SamplePGO ThinLTO backend, we need instcombine before profile annotation
// to convert bitcast to direct calls so that they can be inlined during the		// to convert bitcast to direct calls so that they can be inlined during the
// profile annotation prepration step.		// profile annotation prepration step.
// More details about SamplePGO design can be found in:		// More details about SamplePGO design can be found in:
// https://research.google.com/pubs/pub45290.html		// https://research.google.com/pubs/pub45290.html
// FIXME: revisit how SampleProfileLoad/Inliner/ICP is structured.		// FIXME: revisit how SampleProfileLoad/Inliner/ICP is structured.
if (LoadSampleProfile)		if (LoadSampleProfile)
EarlyFPM.addPass(InstCombinePass());		EarlyFPM.addPass(InstCombinePass());

MPM.addPass(createModuleToFunctionPassAdaptor(std::move(EarlyFPM)));		MPM.addPass(createModuleToFunctionPassAdaptor(std::move(EarlyFPM)));

if (LoadSampleProfile) {		if (LoadSampleProfile) {
// Annotate sample profile right after early FPM to ensure freshness of		// Annotate sample profile right after early FPM to ensure freshness of
// the debug info.		// the debug info.
MPM.addPass(SampleProfileLoaderPass(PGOOpt->ProfileFile,		MPM.addPass(SampleProfileLoaderPass(PGOOpt->ProfileFile,
PGOOpt->ProfileRemappingFile, Phase));		PGOOpt->ProfileRemappingFile, Phase));
// Cache ProfileSummaryAnalysis once to avoid the potential need to insert		// Cache ProfileSummaryAnalysis once to avoid the potential need to insert
// RequireAnalysisPass for PSI before subsequent non-module passes.		// RequireAnalysisPass for PSI before subsequent non-module passes.
MPM.addPass(RequireAnalysisPass<ProfileSummaryAnalysis, Module>());		MPM.addPass(RequireAnalysisPass<ProfileSummaryAnalysis, Module>());
// Do not invoke ICP in the LTOPrelink phase as it makes it hard		// Do not invoke ICP in the LTOPrelink phase as it makes it hard
// for the profile annotation to be accurate in the LTO backend.		// for the profile annotation to be accurate in the LTO backend.
if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink &&		if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink &&
Phase != ThinOrFullLTOPhase::FullLTOPreLink)		Phase != ThinOrFullLTOPhase::FullLTOPreLink) {
// We perform early indirect call promotion here, before globalopt.		// We perform early indirect call promotion here, before globalopt.
// This is important for the ThinLTO backend phase because otherwise		// This is important for the ThinLTO backend phase because otherwise
// imported available_externally functions look unreferenced and are		// imported available_externally functions look unreferenced and are
// removed.		// removed.
MPM.addPass(		MPM.addPass(
PGOIndirectCallPromotion(true /* IsInLTO /, true / SamplePGO */));		PGOIndirectCallPromotion(true /* IsInLTO /, true / SamplePGO */));

		// Set FSProfileFile so that CodeGen can read the profile.
		FSProfileFile.setValue(PGOOpt->ProfileFile);
		}
}		}

if (AttributorRun & AttributorRunOption::MODULE)		if (AttributorRun & AttributorRunOption::MODULE)
MPM.addPass(AttributorPass());		MPM.addPass(AttributorPass());

// Lower type metadata and the type.test intrinsic in the ThinLTO		// Lower type metadata and the type.test intrinsic in the ThinLTO
// post link pipeline after ICP. This is to enable usage of the type		// post link pipeline after ICP. This is to enable usage of the type
// tests in ICP sequences.		// tests in ICP sequences.
▲ Show 20 Lines • Show All 500 Lines • ▼ Show 20 Lines	PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
if (PGOOpt && PGOOpt->Action == PGOOptions::SampleUse) {		if (PGOOpt && PGOOpt->Action == PGOOptions::SampleUse) {
// Load sample profile before running the LTO optimization pipeline.		// Load sample profile before running the LTO optimization pipeline.
MPM.addPass(SampleProfileLoaderPass(PGOOpt->ProfileFile,		MPM.addPass(SampleProfileLoaderPass(PGOOpt->ProfileFile,
PGOOpt->ProfileRemappingFile,		PGOOpt->ProfileRemappingFile,
ThinOrFullLTOPhase::FullLTOPostLink));		ThinOrFullLTOPhase::FullLTOPostLink));
// Cache ProfileSummaryAnalysis once to avoid the potential need to insert		// Cache ProfileSummaryAnalysis once to avoid the potential need to insert
// RequireAnalysisPass for PSI before subsequent non-module passes.		// RequireAnalysisPass for PSI before subsequent non-module passes.
MPM.addPass(RequireAnalysisPass<ProfileSummaryAnalysis, Module>());		MPM.addPass(RequireAnalysisPass<ProfileSummaryAnalysis, Module>());

		// Set FSProfileFile so that CodeGen can read the profile.
		FSProfileFile.setValue(PGOOpt->ProfileFile);
}		}

// Remove unused virtual tables to improve the quality of code generated by		// Remove unused virtual tables to improve the quality of code generated by
// whole-program devirtualization and bitset lowering.		// whole-program devirtualization and bitset lowering.
MPM.addPass(GlobalDCEPass());		MPM.addPass(GlobalDCEPass());

// Force any function attributes we want the rest of the pipeline to observe.		// Force any function attributes we want the rest of the pipeline to observe.
MPM.addPass(ForceFunctionAttrsPass());		MPM.addPass(ForceFunctionAttrsPass());
▲ Show 20 Lines • Show All 1,554 Lines • Show Last 20 Lines

llvm/lib/ProfileData/SampleProf.cpp

	Show All 28 Lines

	using namespace llvm;			using namespace llvm;
	using namespace sampleprof;			using namespace sampleprof;

	static cl::opt<uint64_t> ProfileSymbolListCutOff(			static cl::opt<uint64_t> ProfileSymbolListCutOff(
	"profile-symbol-list-cutoff", cl::Hidden, cl::init(-1), cl::ZeroOrMore,			"profile-symbol-list-cutoff", cl::Hidden, cl::init(-1), cl::ZeroOrMore,
	cl::desc("Cutoff value about how many symbols in profile symbol list "			cl::desc("Cutoff value about how many symbols in profile symbol list "
	"will be used. This is very useful for performance debugging"));			"will be used. This is very useful for performance debugging"));
				cl::opt<std::string>
				FSProfileFile("fs-profile-file", cl::init(""), cl::value_desc("filename"),
				cl::desc("Flow Sensitive profile file name."), cl::Hidden);
				cl::opt<bool> EnableFSDiscriminator(
				"enable-fs-discriminator", cl::Hidden, cl::init(false),
				//"enable-fs-discriminator", cl::Hidden, cl::init(true),
				cl::desc("Enable adding flow senstive discriminators"));

	namespace llvm {			namespace llvm {
	namespace sampleprof {			namespace sampleprof {
	SampleProfileFormat FunctionSamples::Format;			SampleProfileFormat FunctionSamples::Format;
	bool FunctionSamples::ProfileIsProbeBased = false;			bool FunctionSamples::ProfileIsProbeBased = false;
	bool FunctionSamples::ProfileIsCS = false;			bool FunctionSamples::ProfileIsCS = false;
	bool FunctionSamples::UseMD5 = false;			bool FunctionSamples::UseMD5 = false;
	bool FunctionSamples::HasUniqSuffix = true;			bool FunctionSamples::HasUniqSuffix = true;
	▲ Show 20 Lines • Show All 157 Lines • ▼ Show 20 Lines

	raw_ostream &llvm::sampleprof::operator<<(raw_ostream &OS,			raw_ostream &llvm::sampleprof::operator<<(raw_ostream &OS,
	const FunctionSamples &FS) {			const FunctionSamples &FS) {
	FS.print(OS);			FS.print(OS);
	return OS;			return OS;
	}			}

	unsigned FunctionSamples::getOffset(const DILocation *DIL) {			unsigned FunctionSamples::getOffset(const DILocation *DIL) {
	return (DIL->getLine() - DIL->getScope()->getSubprogram()->getLine()) &			unsigned Offset =
	0xffff;			DIL->getLine() - DIL->getScope()->getSubprogram()->getLine();
				if (EnableFSDiscriminator)
				return Offset;
				return Offset & 0xffff;
	}			}

	LineLocation FunctionSamples::getCallSiteIdentifier(const DILocation *DIL) {			LineLocation FunctionSamples::getCallSiteIdentifier(const DILocation *DIL) {
	if (FunctionSamples::ProfileIsProbeBased)			if (FunctionSamples::ProfileIsProbeBased)
	// In a pseudo-probe based profile, a callsite is simply represented by the			// In a pseudo-probe based profile, a callsite is simply represented by the
	// ID of the probe associated with the call instruction. The probe ID is			// ID of the probe associated with the call instruction. The probe ID is
	// encoded in the Discriminator field of the call instruction's debug			// encoded in the Discriminator field of the call instruction's debug
	// metadata.			// metadata.
	return LineLocation(PseudoProbeDwarfDiscriminator::extractProbeIndex(			return LineLocation(PseudoProbeDwarfDiscriminator::extractProbeIndex(
	DIL->getDiscriminator()),			DIL->getDiscriminator()),
	0);			0);
	else			else
	return LineLocation(FunctionSamples::getOffset(DIL),			return LineLocation(FunctionSamples::getOffset(DIL),
	DIL->getBaseDiscriminator());			DIL->getBaseDiscriminator());
	}			}

	const FunctionSamples *FunctionSamples::findFunctionSamples(			const FunctionSamples *FunctionSamples::findFunctionSamples(
	const DILocation DIL, SampleProfileReaderItaniumRemapper Remapper) const {			const DILocation DIL, SampleProfileReaderItaniumRemapper Remapper) const {
	assert(DIL);			assert(DIL);
	SmallVector<std::pair<LineLocation, StringRef>, 10> S;			SmallVector<std::pair<LineLocation, StringRef>, 10> S;

	const DILocation *PrevDIL = DIL;			const DILocation *PrevDIL = DIL;
	for (DIL = DIL->getInlinedAt(); DIL; DIL = DIL->getInlinedAt()) {			for (DIL = DIL->getInlinedAt(); DIL; DIL = DIL->getInlinedAt()) {
	S.push_back(std::make_pair(			unsigned Discriminator;
	LineLocation(getOffset(DIL), DIL->getBaseDiscriminator()),			if (EnableFSDiscriminator)
				Discriminator = DIL->getDiscriminator();
				else
				Discriminator = DIL->getBaseDiscriminator();

				S.push_back(
				std::make_pair(LineLocation(getOffset(DIL), Discriminator),
	PrevDIL->getScope()->getSubprogram()->getLinkageName()));			PrevDIL->getScope()->getSubprogram()->getLinkageName()));
	PrevDIL = DIL;			PrevDIL = DIL;
	}			}
	if (S.size() == 0)			if (S.size() == 0)
	return this;			return this;
	const FunctionSamples *FS = this;			const FunctionSamples *FS = this;
	for (int i = S.size() - 1; i >= 0 && FS != nullptr; i--) {			for (int i = S.size() - 1; i >= 0 && FS != nullptr; i--) {
	FS = FS->findFunctionSamplesAt(S[i].first, S[i].second, Remapper);			FS = FS->findFunctionSamplesAt(S[i].first, S[i].second, Remapper);
	}			}
	▲ Show 20 Lines • Show All 97 Lines • Show Last 20 Lines

llvm/lib/ProfileData/SampleProfReader.cpp

Show All 20 Lines

#include "llvm/ProfileData/SampleProfReader.h"		#include "llvm/ProfileData/SampleProfReader.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/IR/ProfileSummary.h"		#include "llvm/IR/ProfileSummary.h"
#include "llvm/ProfileData/ProfileCommon.h"		#include "llvm/ProfileData/ProfileCommon.h"
#include "llvm/ProfileData/SampleProf.h"		#include "llvm/ProfileData/SampleProf.h"
		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Compression.h"		#include "llvm/Support/Compression.h"
#include "llvm/Support/ErrorOr.h"		#include "llvm/Support/ErrorOr.h"
#include "llvm/Support/LEB128.h"		#include "llvm/Support/LEB128.h"
#include "llvm/Support/LineIterator.h"		#include "llvm/Support/LineIterator.h"
#include "llvm/Support/MD5.h"		#include "llvm/Support/MD5.h"
#include "llvm/Support/MemoryBuffer.h"		#include "llvm/Support/MemoryBuffer.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <algorithm>		#include <algorithm>
#include <cstddef>		#include <cstddef>
#include <cstdint>		#include <cstdint>
#include <limits>		#include <limits>
#include <memory>		#include <memory>
#include <set>		#include <set>
#include <system_error>		#include <system_error>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;
using namespace sampleprof;		using namespace sampleprof;

		#define DEBUG_TYPE "samplepgo-reader"
		extern cl::opt<bool> EnableFSDiscriminator;

/// Dump the function profile for \p FName.		/// Dump the function profile for \p FName.
///		///
/// \param FName Name of the function to print.		/// \param FName Name of the function to print.
/// \param OS Stream to emit the output to.		/// \param OS Stream to emit the output to.
void SampleProfileReader::dumpFunctionProfile(StringRef FName,		void SampleProfileReader::dumpFunctionProfile(StringRef FName,
raw_ostream &OS) {		raw_ostream &OS) {
OS << "Function: " << FName << ": " << Profiles[FName];		OS << "Function: " << FName << ": " << Profiles[FName];
}		}
Show All 21 Lines	static bool ParseHead(const StringRef &Input, StringRef &FName,
if (Input.substr(n1 + 1, n2 - n1 - 1).getAsInteger(10, NumSamples))		if (Input.substr(n1 + 1, n2 - n1 - 1).getAsInteger(10, NumSamples))
return false;		return false;
if (Input.substr(n2 + 1).getAsInteger(10, NumHeadSamples))		if (Input.substr(n2 + 1).getAsInteger(10, NumHeadSamples))
return false;		return false;
return true;		return true;
}		}

/// Returns true if line offset \p L is legal (only has 16 bits).		/// Returns true if line offset \p L is legal (only has 16 bits).
static bool isOffsetLegal(unsigned L) { return (L & 0xffff) == L; }		static bool isOffsetLegal(unsigned L) {
		return EnableFSDiscriminator \|\| (L & 0xffff) == L;
		}

/// Parse \p Input that contains metadata.		/// Parse \p Input that contains metadata.
/// Possible metadata:		/// Possible metadata:
/// - CFG Checksum information:		/// - CFG Checksum information:
/// !CFGChecksum: 12345		/// !CFGChecksum: 12345
/// Stores the FunctionHash (a.k.a. CFG Checksum) into \p FunctionHash.		/// Stores the FunctionHash (a.k.a. CFG Checksum) into \p FunctionHash.
static bool parseMetadata(const StringRef &Input, uint64_t &FunctionHash) {		static bool parseMetadata(const StringRef &Input, uint64_t &FunctionHash) {
if (!Input.startswith("!CFGChecksum:"))		if (!Input.startswith("!CFGChecksum:"))
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	std::error_code SampleProfileReaderText::readImpl() {

InlineCallStack InlineStack;		InlineCallStack InlineStack;
uint32_t ProbeProfileCount = 0;		uint32_t ProbeProfileCount = 0;

// SeenMetadata tracks whether we have processed metadata for the current		// SeenMetadata tracks whether we have processed metadata for the current
// top-level function profile.		// top-level function profile.
bool SeenMetadata = false;		bool SeenMetadata = false;

		#ifndef NDEBUG
		uint64_t FSBucketSamples[6];
		uint32_t FSBucketRecords[6];
		for (int i = 0; i < 6; i++) {
		FSBucketSamples[i] = 0;
		FSBucketRecords[i] = 0;
		}
		#endif

for (; !LineIt.is_at_eof(); ++LineIt) {		for (; !LineIt.is_at_eof(); ++LineIt) {
if ((LineIt)[(LineIt).find_first_not_of(' ')] == '#')		if ((LineIt)[(LineIt).find_first_not_of(' ')] == '#')
continue;		continue;
// Read the header of each function.		// Read the header of each function.
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming]…
//		//
// Note that for function identifiers we are actually expecting		// Note that for function identifiers we are actually expecting
// mangled names, but we may not always get them. This happens when		// mangled names, but we may not always get them. This happens when
// the compiler decides not to emit the function (e.g., it was inlined		// the compiler decides not to emit the function (e.g., it was inlined
// and removed). In this case, the binary will not have the linkage		// and removed). In this case, the binary will not have the linkage
// name for the function, so the profiler will emit the function's		// name for the function, so the profiler will emit the function's
// unmangled name, which may contain characters like ':' and '>' in its		// unmangled name, which may contain characters like ':' and '>' in its
// name (member functions, templates, etc).		// name (member functions, templates, etc).
Show All 35 Lines	if ((*LineIt)[0] != ' ') {
return sampleprof_error::malformed;		return sampleprof_error::malformed;
}		}
if (SeenMetadata && LineTy != LineType::Metadata) {		if (SeenMetadata && LineTy != LineType::Metadata) {
// Metadata must be put at the end of a function profile.		// Metadata must be put at the end of a function profile.
reportError(LineIt.line_number(),		reportError(LineIt.line_number(),
"Found non-metadata after metadata: " + *LineIt);		"Found non-metadata after metadata: " + *LineIt);
return sampleprof_error::malformed;		return sampleprof_error::malformed;
}		}

		// Here we handle FS discriminators:
		uint32_t MaskedDiscriminator = Discriminator;
		MaskedDiscriminator &= getDiscriminatorMask();
		#ifndef NDEBUG
		int Bucket = getFSBucket(Discriminator);
		FSBucketRecords[Bucket] += 1;
		FSBucketSamples[Bucket] += NumSamples;
		#endif

		Discriminator = MaskedDiscriminator;

while (InlineStack.size() > Depth) {		while (InlineStack.size() > Depth) {
InlineStack.pop_back();		InlineStack.pop_back();
}		}
switch (LineTy) {		switch (LineTy) {
case LineType::CallSiteProfile: {		case LineType::CallSiteProfile: {
FunctionSamples &FSamples = InlineStack.back()->functionSamplesAt(		FunctionSamples &FSamples = InlineStack.back()->functionSamplesAt(
LineLocation(LineOffset, Discriminator))[std::string(FName)];		LineLocation(LineOffset, Discriminator))[std::string(FName)];
FSamples.setName(FName);		FSamples.setName(FName);
Show All 33 Lines	assert((ProbeProfileCount == 0 \|\| ProbeProfileCount == Profiles.size()) &&
"Cannot have both probe-based profiles and regular profiles");		"Cannot have both probe-based profiles and regular profiles");
ProfileIsProbeBased = (ProbeProfileCount > 0);		ProfileIsProbeBased = (ProbeProfileCount > 0);
FunctionSamples::ProfileIsProbeBased = ProfileIsProbeBased;		FunctionSamples::ProfileIsProbeBased = ProfileIsProbeBased;
FunctionSamples::ProfileIsCS = ProfileIsCS;		FunctionSamples::ProfileIsCS = ProfileIsCS;

if (Result == sampleprof_error::success)		if (Result == sampleprof_error::success)
computeSummary();		computeSummary();

		#ifndef NDEBUG
		LLVM_DEBUG(dbgs() << "Text reader is done. Statistics:\n");
		for (int i = 0; i < 6; i++) {
		if (FSBucketRecords[i] == 0)
		continue;
		LLVM_DEBUG(dbgs() << "Bucket " << i << ": "
		<< "records=" << FSBucketRecords[i]
		<< " samples=" << FSBucketSamples[i] << "\n");
		}
		#endif

return Result;		return Result;
}		}

bool SampleProfileReaderText::hasFormat(const MemoryBuffer &Buffer) {		bool SampleProfileReaderText::hasFormat(const MemoryBuffer &Buffer) {
bool result = false;		bool result = false;

// Check that the first non-comment line is a valid function header.		// Check that the first non-comment line is a valid function header.
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming]…
line_iterator LineIt(Buffer, /SkipBlanks=/true, '#');		line_iterator LineIt(Buffer, /SkipBlanks=/true, '#');
if (!LineIt.is_at_eof()) {		if (!LineIt.is_at_eof()) {
if ((*LineIt)[0] != ' ') {		if ((*LineIt)[0] != ' ') {
uint64_t NumSamples, NumHeadSamples;		uint64_t NumSamples, NumHeadSamples;
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - << "records=" << FSBucketRecords[i] - << " samples=" << FSBucketSamples[i] << "\n"); + << "records=" << FSBucketRecords[i] + << " samples=" << FSBucketSamples[i] << "\n"); Lint: Pre-merge checks: clang-format: please reformat the code ``` - << "records=" << FSBucketRecords[i]…
StringRef FName;		StringRef FName;
result = ParseHead(*LineIt, FName, NumSamples, NumHeadSamples);		result = ParseHead(*LineIt, FName, NumSamples, NumHeadSamples);
}		}
}		}

return result;		return result;
}		}

▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	for (uint32_t I = 0; I < *NumRecords; ++I) {
auto NumSamples = readNumber<uint64_t>();		auto NumSamples = readNumber<uint64_t>();
if (std::error_code EC = NumSamples.getError())		if (std::error_code EC = NumSamples.getError())
return EC;		return EC;

auto NumCalls = readNumber<uint32_t>();		auto NumCalls = readNumber<uint32_t>();
if (std::error_code EC = NumCalls.getError())		if (std::error_code EC = NumCalls.getError())
return EC;		return EC;

		// Here we handle FS discriminators:
		uint32_t DiscriminatorVal = *Discriminator;
		uint32_t MaskedDiscriminator = DiscriminatorVal & getDiscriminatorMask();
		DiscriminatorVal = MaskedDiscriminator;

for (uint32_t J = 0; J < *NumCalls; ++J) {		for (uint32_t J = 0; J < *NumCalls; ++J) {
auto CalledFunction(readStringFromTable());		auto CalledFunction(readStringFromTable());
if (std::error_code EC = CalledFunction.getError())		if (std::error_code EC = CalledFunction.getError())
return EC;		return EC;

auto CalledFunctionSamples = readNumber<uint64_t>();		auto CalledFunctionSamples = readNumber<uint64_t>();
if (std::error_code EC = CalledFunctionSamples.getError())		if (std::error_code EC = CalledFunctionSamples.getError())
return EC;		return EC;

FProfile.addCalledTargetSamples(LineOffset, Discriminator,		FProfile.addCalledTargetSamples(*LineOffset, DiscriminatorVal,
CalledFunction, CalledFunctionSamples);		CalledFunction, CalledFunctionSamples);
}		}

FProfile.addBodySamples(LineOffset, Discriminator, *NumSamples);		FProfile.addBodySamples(LineOffset, DiscriminatorVal, NumSamples);
}		}

// Read all the samples for inlined function calls.		// Read all the samples for inlined function calls.
auto NumCallsites = readNumber<uint32_t>();		auto NumCallsites = readNumber<uint32_t>();
if (std::error_code EC = NumCallsites.getError())		if (std::error_code EC = NumCallsites.getError())
return EC;		return EC;

for (uint32_t J = 0; J < *NumCallsites; ++J) {		for (uint32_t J = 0; J < *NumCallsites; ++J) {
auto LineOffset = readNumber<uint64_t>();		auto LineOffset = readNumber<uint64_t>();
if (std::error_code EC = LineOffset.getError())		if (std::error_code EC = LineOffset.getError())
return EC;		return EC;

auto Discriminator = readNumber<uint64_t>();		auto Discriminator = readNumber<uint64_t>();
if (std::error_code EC = Discriminator.getError())		if (std::error_code EC = Discriminator.getError())
return EC;		return EC;

auto FName(readStringFromTable());		auto FName(readStringFromTable());
if (std::error_code EC = FName.getError())		if (std::error_code EC = FName.getError())
return EC;		return EC;

		// Here we handle FS discriminators:
		uint32_t DiscriminatorVal = *Discriminator;
		uint32_t MaskedDiscriminator = DiscriminatorVal & getDiscriminatorMask();
		DiscriminatorVal = MaskedDiscriminator;

FunctionSamples &CalleeProfile = FProfile.functionSamplesAt(		FunctionSamples &CalleeProfile = FProfile.functionSamplesAt(
LineLocation(LineOffset, Discriminator))[std::string(*FName)];		LineLocation(LineOffset, DiscriminatorVal))[std::string(FName)];
CalleeProfile.setName(*FName);		CalleeProfile.setName(*FName);
if (std::error_code EC = readProfile(CalleeProfile))		if (std::error_code EC = readProfile(CalleeProfile))
return EC;		return EC;
}		}

return sampleprof_error::success;		return sampleprof_error::success;
}		}

▲ Show 20 Lines • Show All 1,146 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InsertPrefetch.cpp

Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	ErrorOr<std::unique_ptr<SampleProfileReader>> ReaderOrErr =
SampleProfileReader::create(Filename, Ctx);		SampleProfileReader::create(Filename, Ctx);
if (std::error_code EC = ReaderOrErr.getError()) {		if (std::error_code EC = ReaderOrErr.getError()) {
std::string Msg = "Could not open profile: " + EC.message();		std::string Msg = "Could not open profile: " + EC.message();
Ctx.diagnose(DiagnosticInfoSampleProfile(Filename, Msg,		Ctx.diagnose(DiagnosticInfoSampleProfile(Filename, Msg,
DiagnosticSeverity::DS_Warning));		DiagnosticSeverity::DS_Warning));
return false;		return false;
}		}
Reader = std::move(ReaderOrErr.get());		Reader = std::move(ReaderOrErr.get());
		Reader->setDiscriminatorMaskedBitFrom(DILocation::getBaseDiscriminatorBits());
Reader->read();		Reader->read();
return true;		return true;
}		}

void X86InsertPrefetch::getAnalysisUsage(AnalysisUsage &AU) const {		void X86InsertPrefetch::getAnalysisUsage(AnalysisUsage &AU) const {
AU.setPreservesAll();		AU.setPreservesAll();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/SampleProfile.cpp

Show First 20 Lines • Show All 1,736 Lines • ▼ Show 20 Lines	if (std::error_code EC = ReaderOrErr.getError()) {
Ctx.diagnose(DiagnosticInfoSampleProfile(Filename, Msg));		Ctx.diagnose(DiagnosticInfoSampleProfile(Filename, Msg));
return false;		return false;
}		}
Reader = std::move(ReaderOrErr.get());		Reader = std::move(ReaderOrErr.get());
Reader->setSkipFlatProf(LTOPhase == ThinOrFullLTOPhase::ThinLTOPostLink);		Reader->setSkipFlatProf(LTOPhase == ThinOrFullLTOPhase::ThinLTOPostLink);
// set module before reading the profile so reader may be able to only		// set module before reading the profile so reader may be able to only
// read the function profiles which are used by the current module.		// read the function profiles which are used by the current module.
Reader->setModule(&M);		Reader->setModule(&M);
		Reader->setDiscriminatorMaskedBitFrom(DILocation::getBaseDiscriminatorBits());
if (std::error_code EC = Reader->read()) {		if (std::error_code EC = Reader->read()) {
std::string Msg = "profile reading failed: " + EC.message();		std::string Msg = "profile reading failed: " + EC.message();
Ctx.diagnose(DiagnosticInfoSampleProfile(Filename, Msg));		Ctx.diagnose(DiagnosticInfoSampleProfile(Filename, Msg));
return false;		return false;
}		}

PSL = Reader->getProfileSymbolList();		PSL = Reader->getProfileSymbolList();

▲ Show 20 Lines • Show All 225 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/LoopUnroll.cpp

Show First 20 Lines • Show All 570 Lines • ▼ Show 20 Lines	LoopUnrollResult llvm::UnrollLoop(Loop L, UnrollLoopOptions ULO, LoopInfo LI,
// Loop Unrolling might create new loops. While we do preserve LoopInfo, we		// Loop Unrolling might create new loops. While we do preserve LoopInfo, we
// might break loop-simplified form for these loops (as they, e.g., would		// might break loop-simplified form for these loops (as they, e.g., would
// share the same exit blocks). We'll keep track of loops for which we can		// share the same exit blocks). We'll keep track of loops for which we can
// break this so that later we can re-simplify them.		// break this so that later we can re-simplify them.
SmallSetVector<Loop *, 4> LoopsToSimplify;		SmallSetVector<Loop *, 4> LoopsToSimplify;
for (Loop SubLoop : L)		for (Loop SubLoop : L)
LoopsToSimplify.insert(SubLoop);		LoopsToSimplify.insert(SubLoop);

if (Header->getParent()->isDebugInfoForProfiling())		if (Header->getParent()->isDebugInfoForProfiling() && !EnableFSDiscriminator)
for (BasicBlock *BB : L->getBlocks())		for (BasicBlock *BB : L->getBlocks())
for (Instruction &I : *BB)		for (Instruction &I : *BB)
if (!isa<DbgInfoIntrinsic>(&I))		if (!isa<DbgInfoIntrinsic>(&I))
if (const DILocation *DIL = I.getDebugLoc()) {		if (const DILocation *DIL = I.getDebugLoc()) {
auto NewDIL = DIL->cloneByMultiplyingDuplicationFactor(ULO.Count);		auto NewDIL = DIL->cloneByMultiplyingDuplicationFactor(ULO.Count);
if (NewDIL)		if (NewDIL)
I.setDebugLoc(NewDIL.getValue());		I.setDebugLoc(NewDIL.getValue());
else		else
▲ Show 20 Lines • Show All 391 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp

Show First 20 Lines • Show All 340 Lines • ▼ Show 20 Lines	llvm::UnrollAndJamLoop(Loop *L, unsigned Count, unsigned TripCount,
// reverse postorder so that LastValueMap contains the correct value at each		// reverse postorder so that LastValueMap contains the correct value at each
// exit.		// exit.
LoopBlocksDFS DFS(L);		LoopBlocksDFS DFS(L);
DFS.perform(LI);		DFS.perform(LI);
// Stash the DFS iterators before adding blocks to the loop.		// Stash the DFS iterators before adding blocks to the loop.
LoopBlocksDFS::RPOIterator BlockBegin = DFS.beginRPO();		LoopBlocksDFS::RPOIterator BlockBegin = DFS.beginRPO();
LoopBlocksDFS::RPOIterator BlockEnd = DFS.endRPO();		LoopBlocksDFS::RPOIterator BlockEnd = DFS.endRPO();

if (Header->getParent()->isDebugInfoForProfiling())		if (Header->getParent()->isDebugInfoForProfiling() && !EnableFSDiscriminator)
for (BasicBlock *BB : L->getBlocks())		for (BasicBlock *BB : L->getBlocks())
for (Instruction &I : *BB)		for (Instruction &I : *BB)
if (!isa<DbgInfoIntrinsic>(&I))		if (!isa<DbgInfoIntrinsic>(&I))
if (const DILocation *DIL = I.getDebugLoc()) {		if (const DILocation *DIL = I.getDebugLoc()) {
auto NewDIL = DIL->cloneByMultiplyingDuplicationFactor(Count);		auto NewDIL = DIL->cloneByMultiplyingDuplicationFactor(Count);
if (NewDIL)		if (NewDIL)
I.setDebugLoc(NewDIL.getValue());		I.setDebugLoc(NewDIL.getValue());
else		else
▲ Show 20 Lines • Show All 645 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,027 Lines • ▼ Show 20 Lines	static Instruction getDebugLocFromInstOrOperands(Instruction I) {

return I;		return I;
}		}

void InnerLoopVectorizer::setDebugLocFromInst(IRBuilder<> &B, const Value *Ptr) {		void InnerLoopVectorizer::setDebugLocFromInst(IRBuilder<> &B, const Value *Ptr) {
if (const Instruction *Inst = dyn_cast_or_null<Instruction>(Ptr)) {		if (const Instruction *Inst = dyn_cast_or_null<Instruction>(Ptr)) {
const DILocation *DIL = Inst->getDebugLoc();		const DILocation *DIL = Inst->getDebugLoc();
if (DIL && Inst->getFunction()->isDebugInfoForProfiling() &&		if (DIL && Inst->getFunction()->isDebugInfoForProfiling() &&
!isa<DbgInfoIntrinsic>(Inst)) {		!isa<DbgInfoIntrinsic>(Inst) && !EnableFSDiscriminator) {
assert(!VF.isScalable() && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
auto NewDIL =		auto NewDIL =
DIL->cloneByMultiplyingDuplicationFactor(UF * VF.getKnownMinValue());		DIL->cloneByMultiplyingDuplicationFactor(UF * VF.getKnownMinValue());
if (NewDIL)		if (NewDIL)
B.SetCurrentDebugLocation(NewDIL.getValue());		B.SetCurrentDebugLocation(NewDIL.getValue());
else		else
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "Failed to create new discriminator: "		<< "Failed to create new discriminator: "
<< DIL->getFilename() << " Line: " << DIL->getLine());		<< DIL->getFilename() << " Line: " << DIL->getLine());
}		} else
else
B.SetCurrentDebugLocation(DIL);		B.SetCurrentDebugLocation(DIL);
} else		} else
B.SetCurrentDebugLocation(DebugLoc());		B.SetCurrentDebugLocation(DebugLoc());
}		}

/// Write a record \p DebugMsg about vectorization failure to the debug		/// Write a record \p DebugMsg about vectorization failure to the debug
/// output stream. If \p I is passed, it is an instruction that prevents		/// output stream. If \p I is passed, it is an instruction that prevents
/// vectorization.		/// vectorization.
▲ Show 20 Lines • Show All 8,869 Lines • Show Last 20 Lines

llvm/tools/llvm-profdata/llvm-profdata.cpp

Show All 15 Lines
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/ProfileData/InstrProfReader.h"		#include "llvm/ProfileData/InstrProfReader.h"
#include "llvm/ProfileData/InstrProfWriter.h"		#include "llvm/ProfileData/InstrProfWriter.h"
#include "llvm/ProfileData/ProfileCommon.h"		#include "llvm/ProfileData/ProfileCommon.h"
#include "llvm/ProfileData/SampleProfReader.h"		#include "llvm/ProfileData/SampleProfReader.h"
#include "llvm/ProfileData/SampleProfWriter.h"		#include "llvm/ProfileData/SampleProfWriter.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Errc.h"		#include "llvm/Support/Errc.h"
		#include "llvm/Support/FSAFDODiscriminator.h"
#include "llvm/Support/FileSystem.h"		#include "llvm/Support/FileSystem.h"
#include "llvm/Support/Format.h"		#include "llvm/Support/Format.h"
#include "llvm/Support/FormattedStream.h"		#include "llvm/Support/FormattedStream.h"
#include "llvm/Support/InitLLVM.h"		#include "llvm/Support/InitLLVM.h"
#include "llvm/Support/MemoryBuffer.h"		#include "llvm/Support/MemoryBuffer.h"
#include "llvm/Support/Path.h"		#include "llvm/Support/Path.h"
#include "llvm/Support/ThreadPool.h"		#include "llvm/Support/ThreadPool.h"
#include "llvm/Support/Threading.h"		#include "llvm/Support/Threading.h"
▲ Show 20 Lines • Show All 413 Lines • ▼ Show 20 Lines	static void updateInstrProfileEntry(InstrProfileEntry &IFE,
ProfRecord->scale(Numerator, Denominator, [&](instrprof_error E) {		ProfRecord->scale(Numerator, Denominator, [&](instrprof_error E) {
warn(toString(make_error<InstrProfError>(E)));		warn(toString(make_error<InstrProfError>(E)));
});		});
}		}

const uint64_t ColdPercentileIdx = 15;		const uint64_t ColdPercentileIdx = 15;
const uint64_t HotPercentileIdx = 11;		const uint64_t HotPercentileIdx = 11;

		static uint32_t MaskHighBitFrom = 31;

/// Adjust the instr profile in \p WC based on the sample profile in		/// Adjust the instr profile in \p WC based on the sample profile in
/// \p Reader.		/// \p Reader.
static void		static void
adjustInstrProfile(std::unique_ptr<WriterContext> &WC,		adjustInstrProfile(std::unique_ptr<WriterContext> &WC,
std::unique_ptr<sampleprof::SampleProfileReader> &Reader,		std::unique_ptr<sampleprof::SampleProfileReader> &Reader,
unsigned SupplMinSizeThreshold, float ZeroCounterThreshold,		unsigned SupplMinSizeThreshold, float ZeroCounterThreshold,
unsigned InstrProfColdThreshold) {		unsigned InstrProfColdThreshold) {
// Function to its entry in instr profile.		// Function to its entry in instr profile.
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	static void supplementInstrProfile(

// Read sample profile.		// Read sample profile.
LLVMContext Context;		LLVMContext Context;
auto ReaderOrErr =		auto ReaderOrErr =
sampleprof::SampleProfileReader::create(SampleFilename.str(), Context);		sampleprof::SampleProfileReader::create(SampleFilename.str(), Context);
if (std::error_code EC = ReaderOrErr.getError())		if (std::error_code EC = ReaderOrErr.getError())
exitWithErrorCode(EC, SampleFilename);		exitWithErrorCode(EC, SampleFilename);
auto Reader = std::move(ReaderOrErr.get());		auto Reader = std::move(ReaderOrErr.get());
		Reader->setDiscriminatorMaskedBitFrom(MaskHighBitFrom);
if (std::error_code EC = Reader->read())		if (std::error_code EC = Reader->read())
exitWithErrorCode(EC, SampleFilename);		exitWithErrorCode(EC, SampleFilename);

// Read instr profile.		// Read instr profile.
std::mutex ErrorLock;		std::mutex ErrorLock;
SmallSet<instrprof_error, 4> WriterErrorCodes;		SmallSet<instrprof_error, 4> WriterErrorCodes;
auto WC = std::make_unique<WriterContext>(OutputSparse, ErrorLock,		auto WC = std::make_unique<WriterContext>(OutputSparse, ErrorLock,
WriterErrorCodes);		WriterErrorCodes);
Show All 10 Lines
/// by the provided symbol remapper.		/// by the provided symbol remapper.
static sampleprof::FunctionSamples		static sampleprof::FunctionSamples
remapSamples(const sampleprof::FunctionSamples &Samples,		remapSamples(const sampleprof::FunctionSamples &Samples,
SymbolRemapper &Remapper, sampleprof_error &Error) {		SymbolRemapper &Remapper, sampleprof_error &Error) {
sampleprof::FunctionSamples Result;		sampleprof::FunctionSamples Result;
Result.setName(Remapper(Samples.getName()));		Result.setName(Remapper(Samples.getName()));
Result.addTotalSamples(Samples.getTotalSamples());		Result.addTotalSamples(Samples.getTotalSamples());
Result.addHeadSamples(Samples.getHeadSamples());		Result.addHeadSamples(Samples.getHeadSamples());

		uint32_t DiscriminatorMask = getN1Bits(MaskHighBitFrom);
for (const auto &BodySample : Samples.getBodySamples()) {		for (const auto &BodySample : Samples.getBodySamples()) {
Result.addBodySamples(BodySample.first.LineOffset,		uint32_t MaskedDiscriminator =
BodySample.first.Discriminator,		BodySample.first.Discriminator & DiscriminatorMask;
		Result.addBodySamples(BodySample.first.LineOffset, MaskedDiscriminator,
BodySample.second.getSamples());		BodySample.second.getSamples());
for (const auto &Target : BodySample.second.getCallTargets()) {		for (const auto &Target : BodySample.second.getCallTargets()) {
Result.addCalledTargetSamples(BodySample.first.LineOffset,		Result.addCalledTargetSamples(BodySample.first.LineOffset,
BodySample.first.Discriminator,		MaskedDiscriminator,
Remapper(Target.first()), Target.second);		Remapper(Target.first()), Target.second);
}		}
}		}
for (const auto &CallsiteSamples : Samples.getCallsiteSamples()) {		for (const auto &CallsiteSamples : Samples.getCallsiteSamples()) {
sampleprof::FunctionSamplesMap &Target =		sampleprof::FunctionSamplesMap &Target =
Result.functionSamplesAt(CallsiteSamples.first);		Result.functionSamplesAt(CallsiteSamples.first);
for (const auto &Callsite : CallsiteSamples.second) {		for (const auto &Callsite : CallsiteSamples.second) {
sampleprof::FunctionSamples Remapped =		sampleprof::FunctionSamples Remapped =
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	for (const auto &Input : Inputs) {
}		}

// We need to keep the readers around until after all the files are		// We need to keep the readers around until after all the files are
// read so that we do not lose the function names stored in each		// read so that we do not lose the function names stored in each
// reader's memory. The function names are needed to write out the		// reader's memory. The function names are needed to write out the
// merged profile map.		// merged profile map.
Readers.push_back(std::move(ReaderOrErr.get()));		Readers.push_back(std::move(ReaderOrErr.get()));
const auto Reader = Readers.back().get();		const auto Reader = Readers.back().get();
		Reader->setDiscriminatorMaskedBitFrom(MaskHighBitFrom);
if (std::error_code EC = Reader->read()) {		if (std::error_code EC = Reader->read()) {
warnOrExitGivenError(FailMode, EC, Input.Filename);		warnOrExitGivenError(FailMode, EC, Input.Filename);
Readers.pop_back();		Readers.pop_back();
continue;		continue;
}		}

StringMap<FunctionSamples> &Profiles = Reader->getProfiles();		StringMap<FunctionSamples> &Profiles = Reader->getProfiles();
if (ProfileIsProbeBased.hasValue() &&		if (ProfileIsProbeBased.hasValue() &&
▲ Show 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	cl::opt<std::string> SupplInstrWithSample(
"profile is the input of the flag. Output will be in instr "		"profile is the input of the flag. Output will be in instr "
"format (The flag only works with -instr)"));		"format (The flag only works with -instr)"));
cl::opt<float> ZeroCounterThreshold(		cl::opt<float> ZeroCounterThreshold(
"zero-counter-threshold", cl::init(0.7), cl::Hidden,		"zero-counter-threshold", cl::init(0.7), cl::Hidden,
cl::desc("For the function which is cold in instr profile but hot in "		cl::desc("For the function which is cold in instr profile but hot in "
"sample profile, if the ratio of the number of zero counters "		"sample profile, if the ratio of the number of zero counters "
"divided by the the total number of counters is above the "		"divided by the the total number of counters is above the "
"threshold, the profile of the function will be regarded as "		"threshold, the profile of the function will be regarded as "
"being harmful for performance and will be dropped. "));		"being harmful for performance and will be dropped."));
cl::opt<unsigned> SupplMinSizeThreshold(		cl::opt<unsigned> SupplMinSizeThreshold(
"suppl-min-size-threshold", cl::init(10), cl::Hidden,		"suppl-min-size-threshold", cl::init(10), cl::Hidden,
cl::desc("If the size of a function is smaller than the threshold, "		cl::desc("If the size of a function is smaller than the threshold, "
"assume it can be inlined by PGO early inliner and it won't "		"assume it can be inlined by PGO early inliner and it won't "
"be adjusted based on sample profile. "));		"be adjusted based on sample profile."));
cl::opt<unsigned> InstrProfColdThreshold(		cl::opt<unsigned> InstrProfColdThreshold(
"instr-prof-cold-threshold", cl::init(0), cl::Hidden,		"instr-prof-cold-threshold", cl::init(0), cl::Hidden,
cl::desc("User specified cold threshold for instr profile which will "		cl::desc("User specified cold threshold for instr profile which will "
"override the cold threshold got from profile summary. "));		"override the cold threshold got from profile summary."));
		cl::opt<unsigned> MaskHighBitFromVal(
		"mask-highbit-from", cl::init(31), cl::Hidden,
		cl::desc("Zero out the discriminatior bit from this value (0 based) "
		"for exmaple, value 11 will only use base discriminator; "
		"17 will use base and second round; 23 will first 3 rounds."));

cl::ParseCommandLineOptions(argc, argv, "LLVM profile data merger\n");		cl::ParseCommandLineOptions(argc, argv, "LLVM profile data merger\n");
		MaskHighBitFrom = MaskHighBitFromVal.getValue();

WeightedFileVector WeightedInputs;		WeightedFileVector WeightedInputs;
for (StringRef Filename : InputFilenames)		for (StringRef Filename : InputFilenames)
addWeightedInput(WeightedInputs, {std::string(Filename), 1});		addWeightedInput(WeightedInputs, {std::string(Filename), 1});
for (StringRef WeightedFilename : WeightedInputFilenames)		for (StringRef WeightedFilename : WeightedInputFilenames)
addWeightedInput(WeightedInputs, parseWeightedFile(WeightedFilename));		addWeightedInput(WeightedInputs, parseWeightedFile(WeightedFilename));

// Make sure that the file buffer stays alive for the duration of the		// Make sure that the file buffer stays alive for the duration of the
▲ Show 20 Lines • Show All 652 Lines • ▼ Show 20 Lines	FuncSimilarity = weightForFuncSimilarity(FuncInternalSimilarity,
BaseFuncSample, TestFuncSample);		BaseFuncSample, TestFuncSample);
return FuncSimilarity;		return FuncSimilarity;
}		}

void SampleOverlapAggregator::computeSampleProfileOverlap(raw_fd_ostream &OS) {		void SampleOverlapAggregator::computeSampleProfileOverlap(raw_fd_ostream &OS) {
using namespace sampleprof;		using namespace sampleprof;

StringMap<const FunctionSamples *> BaseFuncProf;		StringMap<const FunctionSamples *> BaseFuncProf;

const auto &BaseProfiles = BaseReader->getProfiles();		const auto &BaseProfiles = BaseReader->getProfiles();
for (const auto &BaseFunc : BaseProfiles) {		for (const auto &BaseFunc : BaseProfiles) {
BaseFuncProf.try_emplace(BaseFunc.second.getNameWithContext(),		BaseFuncProf.try_emplace(BaseFunc.second.getNameWithContext(),
&(BaseFunc.second));		&(BaseFunc.second));
}		}
ProfOverlap.UnionCount = BaseFuncProf.size();		ProfOverlap.UnionCount = BaseFuncProf.size();

const auto &TestProfiles = TestReader->getProfiles();		const auto &TestProfiles = TestReader->getProfiles();
▲ Show 20 Lines • Show All 266 Lines • ▼ Show 20 Lines	std::error_code SampleOverlapAggregator::loadProfiles() {

auto TestReaderOrErr = SampleProfileReader::create(TestFilename, Context);		auto TestReaderOrErr = SampleProfileReader::create(TestFilename, Context);
if (std::error_code EC = TestReaderOrErr.getError())		if (std::error_code EC = TestReaderOrErr.getError())
exitWithErrorCode(EC, TestFilename);		exitWithErrorCode(EC, TestFilename);

BaseReader = std::move(BaseReaderOrErr.get());		BaseReader = std::move(BaseReaderOrErr.get());
TestReader = std::move(TestReaderOrErr.get());		TestReader = std::move(TestReaderOrErr.get());

		BaseReader->setDiscriminatorMaskedBitFrom(MaskHighBitFrom);
		TestReader->setDiscriminatorMaskedBitFrom(MaskHighBitFrom);

if (std::error_code EC = BaseReader->read())		if (std::error_code EC = BaseReader->read())
exitWithErrorCode(EC, BaseFilename);		exitWithErrorCode(EC, BaseFilename);
if (std::error_code EC = TestReader->read())		if (std::error_code EC = TestReader->read())
exitWithErrorCode(EC, TestFilename);		exitWithErrorCode(EC, TestFilename);
if (BaseReader->profileIsProbeBased() != TestReader->profileIsProbeBased())		if (BaseReader->profileIsProbeBased() != TestReader->profileIsProbeBased())
exitWithError(		exitWithError(
"cannot compare probe-based profile with non-probe-based profile");		"cannot compare probe-based profile with non-probe-based profile");
if (BaseReader->profileIsCS() != TestReader->profileIsCS())		if (BaseReader->profileIsCS() != TestReader->profileIsCS())
▲ Show 20 Lines • Show All 495 Lines • ▼ Show 20 Lines	static int showSampleProfile(const std::string &Filename, bool ShowCounts,
using namespace sampleprof;		using namespace sampleprof;
LLVMContext Context;		LLVMContext Context;
auto ReaderOrErr = SampleProfileReader::create(Filename, Context);		auto ReaderOrErr = SampleProfileReader::create(Filename, Context);
if (std::error_code EC = ReaderOrErr.getError())		if (std::error_code EC = ReaderOrErr.getError())
exitWithErrorCode(EC, Filename);		exitWithErrorCode(EC, Filename);

auto Reader = std::move(ReaderOrErr.get());		auto Reader = std::move(ReaderOrErr.get());

		Reader->setDiscriminatorMaskedBitFrom(MaskHighBitFrom);

if (ShowSectionInfoOnly) {		if (ShowSectionInfoOnly) {
showSectionInfo(Reader.get(), OS);		showSectionInfo(Reader.get(), OS);
return 0;		return 0;
}		}

if (std::error_code EC = Reader->read())		if (std::error_code EC = Reader->read())
exitWithErrorCode(EC, Filename);		exitWithErrorCode(EC, Filename);

▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	static int show_main(int argc, const char *argv[]) {
cl::opt<bool> ShowProfileSymbolList(		cl::opt<bool> ShowProfileSymbolList(
"show-prof-sym-list", cl::init(false),		"show-prof-sym-list", cl::init(false),
cl::desc("Show profile symbol list if it exists in the profile. "));		cl::desc("Show profile symbol list if it exists in the profile. "));
cl::opt<bool> ShowSectionInfoOnly(		cl::opt<bool> ShowSectionInfoOnly(
"show-sec-info-only", cl::init(false),		"show-sec-info-only", cl::init(false),
cl::desc("Show the information of each section in the sample profile. "		cl::desc("Show the information of each section in the sample profile. "
"The flag is only usable when the sample profile is in "		"The flag is only usable when the sample profile is in "
"extbinary format"));		"extbinary format"));
		cl::opt<unsigned> MaskHighBitFrom1(
		"mask-highbit-from", cl::init(31), cl::Hidden,
		cl::desc("Zero out the discriminatior bit from this value (0 based) "
		"for exmaple, value 11 will only use base discriminator; "
		"17 will use base and second round; 23 will first 3 rounds."));

cl::ParseCommandLineOptions(argc, argv, "LLVM profile data summary\n");		cl::ParseCommandLineOptions(argc, argv, "LLVM profile data summary\n");
		MaskHighBitFrom = MaskHighBitFrom1.getValue();

if (OutputFilename.empty())		if (OutputFilename.empty())
OutputFilename = "-";		OutputFilename = "-";

if (Filename == OutputFilename) {		if (Filename == OutputFilename) {
errs() << sys::path::filename(argv[0])		errs() << sys::path::filename(argv[0])
<< ": Input file name cannot be the same as the output file name!\n";		<< ": Input file name cannot be the same as the output file name!\n";
return 1;		return 1;
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

llvm/unittests/ProfileData/SampleProfTest.cpp

//===- unittest/ProfileData/SampleProfTest.cpp ------------------- C++ --===//		//===- unittest/ProfileData/SampleProfTest.cpp ------------------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/ProfileData/SampleProf.h"		#include "llvm/ProfileData/SampleProf.h"
#include "llvm/ADT/StringMap.h"		#include "llvm/ADT/StringMap.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/ProfileData/SampleProfReader.h"		#include "llvm/ProfileData/SampleProfReader.h"
#include "llvm/ProfileData/SampleProfWriter.h"		#include "llvm/ProfileData/SampleProfWriter.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/ErrorOr.h"		#include "llvm/Support/ErrorOr.h"
#include "llvm/Support/FileSystem.h"		#include "llvm/Support/FileSystem.h"
Show All 36 Lines	struct SampleProfTest : ::testing::Test {

void readProfile(const Module &M, StringRef Profile,		void readProfile(const Module &M, StringRef Profile,
StringRef RemapFile = "") {		StringRef RemapFile = "") {
auto ReaderOrErr = SampleProfileReader::create(		auto ReaderOrErr = SampleProfileReader::create(
std::string(Profile), Context, std::string(RemapFile));		std::string(Profile), Context, std::string(RemapFile));
ASSERT_TRUE(NoError(ReaderOrErr.getError()));		ASSERT_TRUE(NoError(ReaderOrErr.getError()));
Reader = std::move(ReaderOrErr.get());		Reader = std::move(ReaderOrErr.get());
Reader->setModule(&M);		Reader->setModule(&M);
		Reader->setDiscriminatorMaskedBitFrom(
		DILocation::getBaseDiscriminatorBits());
}		}

TempFile createRemapFile() {		TempFile createRemapFile() {
return TempFile("remapfile", "", R"(		return TempFile("remapfile", "", R"(
# Types 'int' and 'long' are equivalent		# Types 'int' and 'long' are equivalent
type i l		type i l
# Function names 'foo' and 'faux' are equivalent		# Function names 'foo' and 'faux' are equivalent
name 3foo 4faux		name 3foo 4faux
▲ Show 20 Lines • Show All 461 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SampleFDO] Flow Sensitive Sample FDO (FSAFDO)Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 332450

llvm/include/llvm/CodeGen/FlowSensitiveSampleProfile.h

llvm/include/llvm/CodeGen/MachineDominators.h

llvm/include/llvm/CodeGen/MachineOptimizationRemarkEmitter.h

llvm/include/llvm/CodeGen/Passes.h

llvm/include/llvm/IR/DebugInfoMetadata.h

llvm/include/llvm/InitializePasses.h

llvm/include/llvm/LTO/Config.h

llvm/include/llvm/ProfileData/SampleProfReader.h

llvm/include/llvm/Support/FSAFDODiscriminator.h

llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h

llvm/lib/CodeGen/CMakeLists.txt

llvm/lib/CodeGen/FlowSensitiveSampleProfile.cpp

llvm/lib/CodeGen/TargetPassConfig.cpp

llvm/lib/LTO/LTOBackend.cpp

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/ProfileData/SampleProf.cpp

llvm/lib/ProfileData/SampleProfReader.cpp

llvm/lib/Target/X86/X86InsertPrefetch.cpp

llvm/lib/Transforms/IPO/SampleProfile.cpp

llvm/lib/Transforms/Utils/LoopUnroll.cpp

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/tools/llvm-profdata/llvm-profdata.cpp

llvm/unittests/ProfileData/SampleProfTest.cpp

[SampleFDO] Flow Sensitive Sample FDO (FSAFDO)
Needs ReviewPublic