This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/ProfileData/
-
llvm/
-
ProfileData/
1/2
SampleProf.h
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
18/38
SampleProfile.cpp
-
test/Transforms/SampleProfile/
-
Transforms/
-
SampleProfile/
-
Inputs/
-
profile-mismatch.prof
-
pseudo-probe-profile-mismatch.prof
-
profile-mismatch.ll
-
pseudo-probe-profile-mismatch.ll

Differential D136627

[SampleFDO] Compute and report profile staleness metrics
ClosedPublic

Authored by wlei on Oct 24 2022, 10:51 AM.

Download Raw Diff

Details

Reviewers

hoy
wenlei
xur
davidxl

Commits

rGd6a0585dd1b8: [SampleFDO] Compute and report profile staleness metrics

Summary

When a profile is stale and profile mismatch could happen, the mismatched samples are discarded, so we'd like to compute the mismatch metrics to quantify how stale the profile is, which will suggest user to refresh the profile if the number is high.

Two sets of metrics are introduced here:

(Num_of_mismatched_funchash/Total_profiled_funchash), (Samples_of_mismached_func_hash / Samples_of_profiled_function) : Here it leverages the FunctionSamples's checksums attribute which is a feature of pseudo probe. When the source code CFG changes, the function checksums will be different, later sample loader will discard the whole functions' samples, this metrics can show the percentage of samples are discarded due to this.
(Num_of_mismatched_callsite/Total_profiled_callsite), (Samples_of_mismached_callsite / Samples_of_profiled_callsite) : This shows how many mismatching for the callsite location as callsite location mismatch will affect the inlining which is highly correlated with the performance. It goes through all the callsite location in the IR and profile, use the call target name to match, report the num of samples in the profile that doesn't match a IR callsite.

This is implemented in a new class(SampleProfileMatcher) and under a switch("--report-profile-staleness"), we plan to extend it with a fuzzy profile matching feature in the future.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

wlei created this revision.Oct 24 2022, 10:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2022, 10:51 AM

Herald added subscribers: ormris, hoy, wenlei, hiraditya. · View Herald Transcript

wlei requested review of this revision.Oct 24 2022, 10:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2022, 10:51 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

wlei retitled this revision from [AutoFDO] Compute profile mismatch metrics to [SampleFDO] Compute profile mismatch metrics.Oct 24 2022, 12:00 PM

wlei edited the summary of this revision. (Show Details)

wlei added reviewers: hoy, wenlei, xur, davidxl.

davidxl added inline comments.Oct 24 2022, 12:22 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
132	The stats seems like a useful thing independent of fuzzy matching. Should it be controlled with a different option?
2072	skip non calls earlier in the loop to reduce nesting level.
2089	what does this case cover? inlined indirect call?
2101	use symbolic constant?

Harbormaster completed remote builds in B193987: Diff 470221.Oct 24 2022, 12:43 PM

wenlei added inline comments.Oct 24 2022, 12:55 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
132	agreed. something like `-report-profile-staleness`?
430	How about having another metric to represent how many functions has mismatched profile? That should tell the breath of the mismatch while your current metric tells the severity of mismatch.
2051	nit: white space line in between.
2069–2072	Intrinsic is narrower than Call, the checks seem out of order.
2083–2084	For readability, restructure it this way? if (CalleeName.empty()) // indirect call in IR ... else // direct call in IR ...
2086	nit: `CTM->find(CalleeName) != CTM->end()` -> `CTM->count(CalleeName)`
2090	nit: `CallsiteFS->find(CalleeName) != CallsiteFS->end()))` -> `CallsiteFS->count(CalleeName)`
2134–2137	Is this always going to be 0 for non-probe case? Should we omit this message when probe isn't used?

davidxl added inline comments.Oct 24 2022, 1:13 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
132	sounds good.

wlei marked 3 inline comments as done.Oct 24 2022, 2:26 PM

wlei added inline comments.

llvm/lib/Transforms/IPO/SampleProfile.cpp
132	Good point, replaced the `sample-profile-fuzzy-match`, will add it back when fuzzy match is ready.
430	Sounds good.(We already have the same stats(https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1752) but it's only available with the `-stats`).
2051	fixed, thanks!
2069–2072	fixed, thanks!
2089	Yes, this is to avoid false positives, otherwise all the indirect call function samples will be reported as mismatching, updated the comments.
2101	fixed, thanks!
2134–2137	Sounds good, made it under the probe condition.

addressing reviewers' feedback

hoy added inline comments.Oct 24 2022, 2:51 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
2108	Should invalid line offset also be considered mismatched samples? The samples will discarded any way. BTW, sort of having an impression that the invalid line offset checking is used anywhere else. Can we unify the usage if that's the case?
2212	nit: rename `matchProfiles` something like `detectProfileMismatch` directly? The name of the switch sounds like only mismatch detection should be done here.

wlei added inline comments.Oct 24 2022, 3:22 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
2108	From my observation, the invalid(negative) offset will never be matched to any offset in the IR, so we actually need to remove it from the profile rather than report as mismatching. Recalling we tried to remove this offset in llvm-profgen, however, it caused a regression(likely because it affect the sample hot cutoff), so we chose to keep it. As for the mismatch, I'm thinking to report the samples that's caused by the real stale profile issue, this might mislead the user. BTW, sort of having an impression that the invalid line offset checking is used anywhere else. Can we unify the usage if that's the case? yeah, that's from llvm-profgen, but that diff is reverted.
2212	Fixed, thanks!

rename matchProfiles --> detectProfileMismatch

wenlei added inline comments.Oct 24 2022, 3:56 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
2144	When the flag is on, we should always emit the messages, regardless of debug build or not. Same for the one below.

Updating D136627: [SampleFDO] Compute profile mismatch metrics

llvm/lib/Transforms/IPO/SampleProfile.cpp
2144	Sounds good! Changed to use `outs`

hoy added inline comments.Oct 24 2022, 5:13 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
2069–2072	nit: maybe more explicit: if (!isa<CallInst>(I) && !isa<InvokeInst>(I)) continue;
2108	Sounds good to exclude invalid line offsets. yeah, that's from llvm-profgen, but that diff is reverted. I see. Factor out the checking logic here to be a general function, maybe a static function of `LineLocation`?

Harbormaster completed remote builds in B194059: Diff 470323.Oct 24 2022, 5:57 PM

wenlei added inline comments.Oct 24 2022, 8:33 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
430	Sorry I wasn't explicit enough, but I was thinking about doing the same for both call site and function. I.e. have both samples mismatch and number of function/callsite mismatch.
2108	IIRC, these represents sample that can't be attributed to a particular line, but excluding them in profile generation changes total sample of a function, which led to regression. I think excluding them in mismatch count makes sense because there's not much user can do about these, i.e. a profile refresh won't make them go away.
2144	I don't know if there's well established convention, but I think most of similar dump go to stderr (i.e. all IR dump goes to stderr).

wenlei added inline comments.Oct 24 2022, 8:46 PM

llvm/include/llvm/ProfileData/SampleProf.h
308–309	It doesn't matter on our platform, but just pointing out that this code isn't portable on host target with size_t -> uint32_t, in which case, the hash becomes just Loc.Discriminator.

wlei added inline comments.Oct 25 2022, 10:34 AM

llvm/include/llvm/ProfileData/SampleProf.h
308–309	This is unintentional, thanks for pointing this out. changed to `uint64_t`
llvm/lib/Transforms/IPO/SampleProfile.cpp
430	I see, added callsite num.
2069–2072	Tried this, but I found this `!isa<InvokeInst>(I)` doesn't work for the test in this diff. I found other places in the sampleloader are all using the `isa<IntrinsicInst>(&I)` check, maybe just be consistent with other place?
2108	I see. Factor out the checking logic here to be a general function, maybe a static function of LineLocation? Sorry I meant the usage in llvm-profgen is removed and now here is the only place to use this check, I'm not very sure if this is a standard way to check if offset is valid(to make it in `LineLocation`), so how about just leave it here(refactor when other places used it in the future)?
2144	Yeah, you're right, dumping into stdout will mess up the IR and make it hard to recompile the IR. Changed to `stderr`

Updating D136627: [SampleFDO] Compute profile mismatch metrics

hoy added inline comments.Oct 25 2022, 10:58 AM

llvm/lib/Transforms/IPO/SampleProfile.cpp
2108	Sounds good.
2154	Still seeing `dbgs()` here. Should it be `stderr`?

wlei added inline comments.Oct 25 2022, 12:08 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
2154	I see, I thought dbgs() is the same thing to errs() for non-debug mode.. changed to `errs()`, thanks!

Updating D136627: [SampleFDO] Compute profile mismatch metrics

lgtm, thanks!

This revision is now accepted and ready to land.Oct 25 2022, 12:11 PM

hoy accepted this revision.Oct 25 2022, 1:13 PM

Harbormaster completed remote builds in B194246: Diff 470588.Oct 25 2022, 1:19 PM

wlei mentioned this in D136698: [SampleFDO] Persist profile staleness metrics into binary.Oct 25 2022, 1:45 PM

lgtm

wlei retitled this revision from [SampleFDO] Compute profile mismatch metrics to [SampleFDO] Compute and report profile staleness metrics.Oct 26 2022, 8:54 PM

wlei edited the summary of this revision. (Show Details)

Closed by commit rGd6a0585dd1b8: [SampleFDO] Compute and report profile staleness metrics (authored by wlei). · Explain WhyOct 26 2022, 9:08 PM

This revision was automatically updated to reflect the committed changes.

wlei added a commit: rGd6a0585dd1b8: [SampleFDO] Compute and report profile staleness metrics.

tmsriram added a subscriber: tmsriram.Oct 31 2022, 3:11 PM

wlei mentioned this in rG47b0758049ea: [SampleFDO] Persist profile staleness metrics into binary.Nov 9 2022, 10:35 PM

wlei mentioned this in D147456: [SamplePGO] Stale profile matching(part 1).Apr 5 2023, 6:04 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

ProfileData/

SampleProf.h

7 lines

lib/

Transforms/

IPO/

SampleProfile.cpp

154 lines

test/

Transforms/

SampleProfile/

Inputs/

profile-mismatch.prof

14 lines

pseudo-probe-profile-mismatch.prof

14 lines

profile-mismatch.ll

197 lines

pseudo-probe-profile-mismatch.ll

233 lines

Diff 471011

llvm/include/llvm/ProfileData/SampleProf.h

Show First 20 Lines • Show All 297 Lines • ▼ Show 20 Lines	struct LineLocation {
bool operator!=(const LineLocation &O) const {		bool operator!=(const LineLocation &O) const {
return LineOffset != O.LineOffset \|\| Discriminator != O.Discriminator;		return LineOffset != O.LineOffset \|\| Discriminator != O.Discriminator;
}		}

uint32_t LineOffset;		uint32_t LineOffset;
uint32_t Discriminator;		uint32_t Discriminator;
};		};

		struct LineLocationHash {
		uint64_t operator()(const LineLocation &Loc) const {
		return std::hash<std::uint64_t>{}((((uint64_t)Loc.LineOffset) << 32) \|
		Loc.Discriminator);
		wenleiUnsubmitted Not Done Reply Inline Actions It doesn't matter on our platform, but just pointing out that this code isn't portable on host target with size_t -> uint32_t, in which case, the hash becomes just Loc.Discriminator. wenlei: It doesn't matter on our platform, but just pointing out that this code isn't portable on host…
		wleiAuthorUnsubmitted Done Reply Inline Actions This is unintentional, thanks for pointing this out. changed to `uint64_t` wlei: This is unintentional, thanks for pointing this out. changed to `uint64_t`
		}
		};

raw_ostream &operator<<(raw_ostream &OS, const LineLocation &Loc);		raw_ostream &operator<<(raw_ostream &OS, const LineLocation &Loc);

/// Representation of a single sample record.		/// Representation of a single sample record.
///		///
/// A sample record is represented by a positive integer value, which		/// A sample record is represented by a positive integer value, which
/// indicates how frequently was the associated line location executed.		/// indicates how frequently was the associated line location executed.
///		///
/// Additionally, if the associated location contains a function call,		/// Additionally, if the associated location contains a function call,
▲ Show 20 Lines • Show All 1,044 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/SampleProfile.cpp

Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines

// The named file contains a set of transformations that may have been applied		// The named file contains a set of transformations that may have been applied
// to the symbol names between the program from which the sample data was		// to the symbol names between the program from which the sample data was
// collected and the current program's symbols.		// collected and the current program's symbols.
static cl::opt<std::string> SampleProfileRemappingFile(		static cl::opt<std::string> SampleProfileRemappingFile(
"sample-profile-remapping-file", cl::init(""), cl::value_desc("filename"),		"sample-profile-remapping-file", cl::init(""), cl::value_desc("filename"),
cl::desc("Profile remapping file loaded by -sample-profile"), cl::Hidden);		cl::desc("Profile remapping file loaded by -sample-profile"), cl::Hidden);

		static cl::opt<bool> ReportProfileStaleness(
		davidxlUnsubmitted Not Done Reply Inline Actions The stats seems like a useful thing independent of fuzzy matching. Should it be controlled with a different option? davidxl: The stats seems like a useful thing independent of fuzzy matching. Should it be controlled with…
		wenleiUnsubmitted Not Done Reply Inline Actions agreed. something like `-report-profile-staleness`? wenlei: agreed. something like `-report-profile-staleness`?
		davidxlUnsubmitted Not Done Reply Inline Actions sounds good. davidxl: sounds good.
		wleiAuthorUnsubmitted Done Reply Inline Actions Good point, replaced the `sample-profile-fuzzy-match`, will add it back when fuzzy match is ready. wlei: Good point, replaced the `sample-profile-fuzzy-match`, will add it back when fuzzy match is…
		"report-profile-staleness", cl::Hidden, cl::init(false),
		cl::desc("Compute and report stale profile statistical metrics."));

static cl::opt<bool> ProfileSampleAccurate(		static cl::opt<bool> ProfileSampleAccurate(
"profile-sample-accurate", cl::Hidden, cl::init(false),		"profile-sample-accurate", cl::Hidden, cl::init(false),
cl::desc("If the sample profile is accurate, we will mark all un-sampled "		cl::desc("If the sample profile is accurate, we will mark all un-sampled "
"callsite and function as having 0 samples. Otherwise, treat "		"callsite and function as having 0 samples. Otherwise, treat "
"un-sampled callsites and functions conservatively as unknown. "));		"un-sampled callsites and functions conservatively as unknown. "));

static cl::opt<bool> ProfileSampleBlockAccurate(		static cl::opt<bool> ProfileSampleBlockAccurate(
"profile-sample-block-accurate", cl::Hidden, cl::init(false),		"profile-sample-block-accurate", cl::Hidden, cl::init(false),
▲ Show 20 Lines • Show All 269 Lines • ▼ Show 20 Lines	bool operator()(const InlineCandidate &LHS, const InlineCandidate &RHS) {
return LCS->getGUID(LCS->getName()) < RCS->getGUID(RCS->getName());		return LCS->getGUID(LCS->getName()) < RCS->getGUID(RCS->getName());
}		}
};		};

using CandidateQueue =		using CandidateQueue =
PriorityQueue<InlineCandidate, std::vector<InlineCandidate>,		PriorityQueue<InlineCandidate, std::vector<InlineCandidate>,
CandidateComparer>;		CandidateComparer>;

		// Sample profile matching - fuzzy match.
		class SampleProfileMatcher {
		Module &M;
		SampleProfileReader &Reader;
		const PseudoProbeManager *ProbeManager;

		// Profile mismatching statstics.
		uint64_t TotalProfiledCallsite = 0;
		uint64_t NumMismatchedCallsite = 0;
		uint64_t MismatchedCallsiteSamples = 0;
		wenleiUnsubmitted Not Done Reply Inline Actions How about having another metric to represent how many functions has mismatched profile? That should tell the breath of the mismatch while your current metric tells the severity of mismatch. wenlei: How about having another metric to represent how many functions has mismatched profile? That…
		wleiAuthorUnsubmitted Done Reply Inline Actions Sounds good.(We already have the same stats(https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1752) but it's only available with the `-stats`). wlei: Sounds good.(We already have the same stats(https://github.com/llvm/llvm…
		wenleiUnsubmitted Not Done Reply Inline Actions Sorry I wasn't explicit enough, but I was thinking about doing the same for both call site and function. I.e. have both samples mismatch and number of function/callsite mismatch. wenlei: Sorry I wasn't explicit enough, but I was thinking about doing the same for both call site and…
		wleiAuthorUnsubmitted Done Reply Inline Actions I see, added callsite num. wlei: I see, added callsite num.
		uint64_t TotalCallsiteSamples = 0;
		uint64_t TotalProfiledFunc = 0;
		uint64_t NumMismatchedFuncHash = 0;
		uint64_t MismatchedFuncHashSamples = 0;
		uint64_t TotalFuncHashSamples = 0;

		public:
		SampleProfileMatcher(Module &M, SampleProfileReader &Reader,
		const PseudoProbeManager *ProbeManager)
		: M(M), Reader(Reader), ProbeManager(ProbeManager) {}
		void detectProfileMismatch();
		void detectProfileMismatch(const Function &F, const FunctionSamples &FS);
		};

/// Sample profile pass.		/// Sample profile pass.
///		///
/// This pass reads profile data from the file specified by		/// This pass reads profile data from the file specified by
/// -sample-profile-file and annotates every affected function with the		/// -sample-profile-file and annotates every affected function with the
/// profile information found in that file.		/// profile information found in that file.
class SampleProfileLoader final		class SampleProfileLoader final
: public SampleProfileLoaderBaseImpl<BasicBlock> {		: public SampleProfileLoaderBaseImpl<BasicBlock> {
public:		public:
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	protected:
bool ProfAccForSymsInList;		bool ProfAccForSymsInList;

// External inline advisor used to replay inline decision from remarks.		// External inline advisor used to replay inline decision from remarks.
std::unique_ptr<InlineAdvisor> ExternalInlineAdvisor;		std::unique_ptr<InlineAdvisor> ExternalInlineAdvisor;

// A pseudo probe helper to correlate the imported sample counts.		// A pseudo probe helper to correlate the imported sample counts.
std::unique_ptr<PseudoProbeManager> ProbeManager;		std::unique_ptr<PseudoProbeManager> ProbeManager;

		// A helper to implement the sample profile matching algorithm.
		std::unique_ptr<SampleProfileMatcher> MatchingManager;

private:		private:
const char *getAnnotatedRemarkPassName() const {		const char *getAnnotatedRemarkPassName() const {
return AnnotatedPassName.c_str();		return AnnotatedPassName.c_str();
}		}
};		};
} // end anonymous namespace		} // end anonymous namespace

ErrorOr<uint64_t> SampleProfileLoader::getInstWeight(const Instruction &Inst) {		ErrorOr<uint64_t> SampleProfileLoader::getInstWeight(const Instruction &Inst) {
▲ Show 20 Lines • Show All 1,451 Lines • ▼ Show 20 Lines	if (!ProbeManager->moduleIsProbed(M)) {
const char *Msg =		const char *Msg =
"Pseudo-probe-based profile requires SampleProfileProbePass";		"Pseudo-probe-based profile requires SampleProfileProbePass";
Ctx.diagnose(DiagnosticInfoSampleProfile(M.getModuleIdentifier(), Msg,		Ctx.diagnose(DiagnosticInfoSampleProfile(M.getModuleIdentifier(), Msg,
DS_Warning));		DS_Warning));
return false;		return false;
}		}
}		}

		if (ReportProfileStaleness) {
		MatchingManager =
		std::make_unique<SampleProfileMatcher>(M, *Reader, ProbeManager.get());
		}

return true;		return true;
}		}

		wenleiUnsubmitted Not Done Reply Inline Actions nit: white space line in between. wenlei: nit: white space line in between.
		wleiAuthorUnsubmitted Done Reply Inline Actions fixed, thanks! wlei: fixed, thanks!
		void SampleProfileMatcher::detectProfileMismatch(const Function &F,
		const FunctionSamples &FS) {
		if (FunctionSamples::ProfileIsProbeBased) {
		uint64_t Count = FS.getTotalSamples();
		TotalFuncHashSamples += Count;
		TotalProfiledFunc++;
		if (!ProbeManager->profileIsValid(F, FS)) {
		MismatchedFuncHashSamples += Count;
		NumMismatchedFuncHash++;
		return;
		}
		}

		std::unordered_set<LineLocation, LineLocationHash> MatchedCallsiteLocs;

		// Go through all the callsites on the IR and flag the callsite if the target
		// name is the same as the one in the profile.
		for (auto &BB : F) {
		for (auto &I : BB.getInstList()) {
		if (!isa<CallBase>(&I) \|\| isa<IntrinsicInst>(&I))
		continue;
		davidxlUnsubmitted Not Done Reply Inline Actions skip non calls earlier in the loop to reduce nesting level. davidxl: skip non calls earlier in the loop to reduce nesting level.
		wenleiUnsubmitted Not Done Reply Inline Actions Intrinsic is narrower than Call, the checks seem out of order. wenlei: Intrinsic is narrower than Call, the checks seem out of order.
		wleiAuthorUnsubmitted Done Reply Inline Actions fixed, thanks! wlei: fixed, thanks!
		hoyUnsubmitted Not Done Reply Inline Actions nit: maybe more explicit: if (!isa<CallInst>(I) && !isa<InvokeInst>(I)) continue; hoy: nit: maybe more explicit: ``` if (!isa<CallInst>(I) && !isa<InvokeInst>(I))…
		wleiAuthorUnsubmitted Done Reply Inline Actions Tried this, but I found this `!isa<InvokeInst>(I)` doesn't work for the test in this diff. I found other places in the sampleloader are all using the `isa<IntrinsicInst>(&I)` check, maybe just be consistent with other place? wlei: Tried this, but I found this `!isa<InvokeInst>(I)` doesn't work for the test in this diff. I…

		const auto *CB = dyn_cast<CallBase>(&I);
		if (auto &DLoc = I.getDebugLoc()) {
		LineLocation IRCallsite = FunctionSamples::getCallSiteIdentifier(DLoc);

		StringRef CalleeName;
		if (Function *Callee = CB->getCalledFunction())
		CalleeName = Callee->getName();

		const auto CTM = FS.findCallTargetMapAt(IRCallsite);
		const auto CallsiteFS = FS.findFunctionSamplesMapAt(IRCallsite);

		wenleiUnsubmitted Done Reply Inline Actions For readability, restructure it this way? if (CalleeName.empty()) // indirect call in IR ... else // direct call in IR ... wenlei: For readability, restructure it this way? ``` if (CalleeName.empty()) // indirect call in IR…
		// Indirect call case.
		if (CalleeName.empty()) {
		wenleiUnsubmitted Done Reply Inline Actions nit: `CTM->find(CalleeName) != CTM->end()` -> `CTM->count(CalleeName)` wenlei: nit: `CTM->find(CalleeName) != CTM->end()` -> `CTM->count(CalleeName)`
		// Since indirect call does not have the CalleeName, check
		// conservatively if callsite in the profile is a callsite location.
		// This is to avoid nums of false positive since otherwise all the
		davidxlUnsubmitted Not Done Reply Inline Actions what does this case cover? inlined indirect call? davidxl: what does this case cover? inlined indirect call?
		wleiAuthorUnsubmitted Done Reply Inline Actions Yes, this is to avoid false positives, otherwise all the indirect call function samples will be reported as mismatching, updated the comments. wlei: Yes, this is to avoid false positives, otherwise all the indirect call function samples will be…
		// indirect call samples will be reported as mismatching.
		wenleiUnsubmitted Done Reply Inline Actions nit: `CallsiteFS->find(CalleeName) != CallsiteFS->end()))` -> `CallsiteFS->count(CalleeName)` wenlei: nit: `CallsiteFS->find(CalleeName) != CallsiteFS->end()))` -> `CallsiteFS->count(CalleeName)`
		if ((CTM && !CTM->empty()) \|\| (CallsiteFS && !CallsiteFS->empty()))
		MatchedCallsiteLocs.insert(IRCallsite);
		} else {
		// Check if the call target name is matched for direct call case.
		if ((CTM && CTM->count(CalleeName)) \|\|
		(CallsiteFS && CallsiteFS->count(CalleeName)))
		MatchedCallsiteLocs.insert(IRCallsite);
		}
		}
		}
		}
		davidxlUnsubmitted Not Done Reply Inline Actions use symbolic constant? davidxl: use symbolic constant?
		wleiAuthorUnsubmitted Done Reply Inline Actions fixed, thanks! wlei: fixed, thanks!

		auto isInvalidLineOffset = [](uint32_t LineOffset) {
		return LineOffset & 0x8000;
		};

		// Check if there are any callsites in the profile that does not match to any
		// IR callsites, those callsite samples will be discarded.
		hoyUnsubmitted Not Done Reply Inline Actions Should invalid line offset also be considered mismatched samples? The samples will discarded any way. BTW, sort of having an impression that the invalid line offset checking is used anywhere else. Can we unify the usage if that's the case? hoy: Should invalid line offset also be considered mismatched samples? The samples will discarded…
		wleiAuthorUnsubmitted Done Reply Inline Actions From my observation, the invalid(negative) offset will never be matched to any offset in the IR, so we actually need to remove it from the profile rather than report as mismatching. Recalling we tried to remove this offset in llvm-profgen, however, it caused a regression(likely because it affect the sample hot cutoff), so we chose to keep it. As for the mismatch, I'm thinking to report the samples that's caused by the real stale profile issue, this might mislead the user. BTW, sort of having an impression that the invalid line offset checking is used anywhere else. Can we unify the usage if that's the case? yeah, that's from llvm-profgen, but that diff is reverted. wlei: From my observation, the invalid(negative) offset will never be matched to any offset in the IR…
		hoyUnsubmitted Not Done Reply Inline Actions Sounds good to exclude invalid line offsets. yeah, that's from llvm-profgen, but that diff is reverted. I see. Factor out the checking logic here to be a general function, maybe a static function of `LineLocation`? hoy: Sounds good to exclude invalid line offsets. > yeah, that's from llvm-profgen, but that diff…
		wenleiUnsubmitted Not Done Reply Inline Actions IIRC, these represents sample that can't be attributed to a particular line, but excluding them in profile generation changes total sample of a function, which led to regression. I think excluding them in mismatch count makes sense because there's not much user can do about these, i.e. a profile refresh won't make them go away. wenlei: IIRC, these represents sample that can't be attributed to a particular line, but excluding them…
		wleiAuthorUnsubmitted Done Reply Inline Actions I see. Factor out the checking logic here to be a general function, maybe a static function of LineLocation? Sorry I meant the usage in llvm-profgen is removed and now here is the only place to use this check, I'm not very sure if this is a standard way to check if offset is valid(to make it in `LineLocation`), so how about just leave it here(refactor when other places used it in the future)? wlei: > I see. Factor out the checking logic here to be a general function, maybe a static function…
		hoyUnsubmitted Not Done Reply Inline Actions Sounds good. hoy: Sounds good.
		for (auto &I : FS.getBodySamples()) {
		const LineLocation &Loc = I.first;
		if (isInvalidLineOffset(Loc.LineOffset))
		continue;

		uint64_t Count = I.second.getSamples();
		if (!I.second.getCallTargets().empty()) {
		TotalCallsiteSamples += Count;
		TotalProfiledCallsite++;
		if (!MatchedCallsiteLocs.count(Loc)) {
		MismatchedCallsiteSamples += Count;
		NumMismatchedCallsite++;
		}
		}
		}

		for (auto &I : FS.getCallsiteSamples()) {
		const LineLocation &Loc = I.first;
		if (isInvalidLineOffset(Loc.LineOffset))
		continue;

		uint64_t Count = 0;
		for (auto &FM : I.second) {
		Count += FM.second.getTotalSamples();
		}
		TotalCallsiteSamples += Count;
		TotalProfiledCallsite++;
		if (!MatchedCallsiteLocs.count(Loc)) {
		MismatchedCallsiteSamples += Count;
		wenleiUnsubmitted Not Done Reply Inline Actions Is this always going to be 0 for non-probe case? Should we omit this message when probe isn't used? wenlei: Is this always going to be 0 for non-probe case? Should we omit this message when probe isn't…
		wleiAuthorUnsubmitted Done Reply Inline Actions Sounds good, made it under the probe condition. wlei: Sounds good, made it under the probe condition.
		NumMismatchedCallsite++;
		}
		}
		}

		void SampleProfileMatcher::detectProfileMismatch() {
		for (auto &F : M) {
		wenleiUnsubmitted Not Done Reply Inline Actions When the flag is on, we should always emit the messages, regardless of debug build or not. Same for the one below. wenlei: When the flag is on, we should always emit the messages, regardless of debug build or not. Same…
		wleiAuthorUnsubmitted Done Reply Inline Actions Sounds good! Changed to use `outs` wlei: Sounds good! Changed to use `outs`
		wenleiUnsubmitted Not Done Reply Inline Actions I don't know if there's well established convention, but I think most of similar dump go to stderr (i.e. all IR dump goes to stderr). wenlei: I don't know if there's well established convention, but I think most of similar dump go to…
		wleiAuthorUnsubmitted Done Reply Inline Actions Yeah, you're right, dumping into stdout will mess up the IR and make it hard to recompile the IR. Changed to `stderr` wlei: Yeah, you're right, dumping into stdout will mess up the IR and make it hard to recompile the…
		if (F.isDeclaration() \|\| !F.hasFnAttribute("use-sample-profile"))
		continue;
		FunctionSamples *FS = Reader.getSamplesFor(F);
		if (!FS)
		continue;
		detectProfileMismatch(F, *FS);
		}

		if (FunctionSamples::ProfileIsProbeBased) {
		errs() << "(" << NumMismatchedFuncHash << "/" << TotalProfiledFunc << ")"
		hoyUnsubmitted Not Done Reply Inline Actions Still seeing `dbgs()` here. Should it be `stderr`? hoy: Still seeing ` dbgs()` here. Should it be `stderr`?
		wleiAuthorUnsubmitted Done Reply Inline Actions I see, I thought dbgs() is the same thing to errs() for non-debug mode.. changed to `errs()`, thanks! wlei: I see, I thought dbgs() is the same thing to errs() for non-debug mode.. changed to `errs()`…
		<< " of functions' profile are invalid and "
		<< " (" << MismatchedFuncHashSamples << "/" << TotalFuncHashSamples
		<< ")"
		<< " of samples are discarded due to function hash mismatch.\n";
		}
		errs() << "(" << NumMismatchedCallsite << "/" << TotalProfiledCallsite << ")"
		<< " of callsites' profile are invalid and "
		<< "(" << MismatchedCallsiteSamples << "/" << TotalCallsiteSamples
		<< ")"
		<< " of samples are discarded due to callsite location mismatch.\n";
		}

bool SampleProfileLoader::runOnModule(Module &M, ModuleAnalysisManager *AM,		bool SampleProfileLoader::runOnModule(Module &M, ModuleAnalysisManager *AM,
ProfileSummaryInfo _PSI, CallGraph CG) {		ProfileSummaryInfo _PSI, CallGraph CG) {
GUIDToFuncNameMapper Mapper(M, *Reader, GUIDToFuncNameMap);		GUIDToFuncNameMapper Mapper(M, *Reader, GUIDToFuncNameMap);

PSI = _PSI;		PSI = _PSI;
if (M.getProfileSummary(/* IsCS */ false) == nullptr) {		if (M.getProfileSummary(/* IsCS */ false) == nullptr) {
M.setProfileSummary(Reader->getSummary().getMD(M.getContext()),		M.setProfileSummary(Reader->getSummary().getMD(M.getContext()),
ProfileSummary::PSK_Sample);		ProfileSummary::PSK_Sample);
Show All 28 Lines	if (Remapper) {
if (*MapName != OrigName && !MapName->empty())		if (*MapName != OrigName && !MapName->empty())
SymbolMap.insert(std::make_pair(*MapName, F));		SymbolMap.insert(std::make_pair(*MapName, F));
}		}
}		}
}		}
assert(SymbolMap.count(StringRef()) == 0 &&		assert(SymbolMap.count(StringRef()) == 0 &&
"No empty StringRef should be added in SymbolMap");		"No empty StringRef should be added in SymbolMap");

		if (ReportProfileStaleness)
		MatchingManager->detectProfileMismatch();
		hoyUnsubmitted Not Done Reply Inline Actions nit: rename `matchProfiles` something like `detectProfileMismatch` directly? The name of the switch sounds like only mismatch detection should be done here. hoy: nit: rename `matchProfiles` something like `detectProfileMismatch` directly? The name of the…
		wleiAuthorUnsubmitted Done Reply Inline Actions Fixed, thanks! wlei: Fixed, thanks!

bool retval = false;		bool retval = false;
for (auto *F : buildFunctionOrder(M, CG)) {		for (auto *F : buildFunctionOrder(M, CG)) {
assert(!F->isDeclaration());		assert(!F->isDeclaration());
clearFunctionData();		clearFunctionData();
retval \|= runOnFunction(*F, AM);		retval \|= runOnFunction(*F, AM);
}		}

// Account for cold calls not inlined....		// Account for cold calls not inlined....
▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

llvm/test/Transforms/SampleProfile/Inputs/profile-mismatch.prof

This file was added.

				main:30:0
				0: 0
				1.1: 0
				3: 10 matched:10
				4: 10
				5: 10 bar_mismatch:10
				8: 0
				7: foo:10
				1: 5
				2: 5
				bar:10:10
				1: 10
				matched:10:10
				1: 10

llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-profile-mismatch.prof

This file was added.

				main:30:0
				1: 0
				12: 10 matched:10
				20: 10 bar:10
				13: foo_mismatch:10
				1: 10
				!CFGChecksum: 4294967295
				!CFGChecksum: 844635331715433
				bar:10:10
				1: 10
				!CFGChecksum: 42949671295
				matched:10:10
				1: 10
				!CFGChecksum: 4294967295

llvm/test/Transforms/SampleProfile/profile-mismatch.ll

This file was added.

				; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-mismatch.prof -report-profile-staleness -S 2>%t
				; RUN: FileCheck %s --input-file %t

				; CHECK: (2/3) of callsites' profile are invalid and (20/30) of samples are discarded due to callsite location mismatch.

				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@x = dso_local global i32 0, align 4, !dbg !0

				; Function Attrs: nounwind uwtable
				define dso_local i32 @foo(i32 noundef %x) #0 !dbg !12 {
				entry:
				%y = alloca i32, align 4
				call void @llvm.dbg.value(metadata i32 %x, metadata !16, metadata !DIExpression()), !dbg !18
				call void @llvm.lifetime.start.p0(i64 4, ptr nonnull %y), !dbg !19
				call void @llvm.dbg.declare(metadata ptr %y, metadata !17, metadata !DIExpression()), !dbg !20
				%add = add nsw i32 %x, 1, !dbg !21
				store volatile i32 %add, ptr %y, align 4, !dbg !20, !tbaa !22
				%y.0. = load volatile i32, ptr %y, align 4, !dbg !26, !tbaa !22
				%add1 = add nsw i32 %y.0., 1, !dbg !27
				call void @llvm.lifetime.end.p0(i64 4, ptr nonnull %y), !dbg !28
				ret i32 %add1, !dbg !29
				}

				; Function Attrs: mustprogress nocallback nofree nosync nounwind readnone speculatable willreturn
				declare void @llvm.dbg.declare(metadata, metadata, metadata) #1

				; Function Attrs: argmemonly mustprogress nocallback nofree nosync nounwind willreturn
				declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #2

				; Function Attrs: argmemonly mustprogress nocallback nofree nosync nounwind willreturn
				declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #2

				; Function Attrs: noinline nounwind uwtable
				define dso_local i32 @bar(i32 noundef %x) #3 !dbg !30 {
				entry:
				call void @llvm.dbg.value(metadata i32 %x, metadata !32, metadata !DIExpression()), !dbg !33
				%add = add nsw i32 %x, 2, !dbg !34
				ret i32 %add, !dbg !35
				}

				; Function Attrs: noinline nounwind uwtable
				define dso_local i32 @matched(i32 noundef %x) #3 !dbg !36 {
				entry:
				call void @llvm.dbg.value(metadata i32 %x, metadata !38, metadata !DIExpression()), !dbg !39
				%add = add nsw i32 %x, 3, !dbg !40
				ret i32 %add, !dbg !41
				}

				; Function Attrs: nounwind uwtable
				define dso_local i32 @main() #0 !dbg !42 {
				entry:
				call void @llvm.dbg.value(metadata i32 0, metadata !46, metadata !DIExpression()), !dbg !52
				br label %for.cond, !dbg !53

				for.cond: ; preds = %for.cond.cleanup3, %entry
				%i.0 = phi i32 [ 0, %entry ], [ %inc8, %for.cond.cleanup3 ], !dbg !52
				call void @llvm.dbg.value(metadata i32 %i.0, metadata !46, metadata !DIExpression()), !dbg !52
				%cmp = icmp ult i32 %i.0, 1000, !dbg !54
				br i1 %cmp, label %for.body, label %for.cond.cleanup, !dbg !56

				for.cond.cleanup: ; preds = %for.cond
				ret i32 0, !dbg !58

				for.body: ; preds = %for.cond
				call void @llvm.dbg.value(metadata i32 0, metadata !48, metadata !DIExpression()), !dbg !59
				br label %for.cond1, !dbg !60

				for.cond1: ; preds = %for.body4, %for.body
				%a.0 = phi i32 [ 0, %for.body ], [ %inc, %for.body4 ], !dbg !59
				call void @llvm.dbg.value(metadata i32 %a.0, metadata !48, metadata !DIExpression()), !dbg !59
				%cmp2 = icmp ult i32 %a.0, 10000, !dbg !61
				br i1 %cmp2, label %for.body4, label %for.cond.cleanup3, !dbg !64

				for.cond.cleanup3: ; preds = %for.cond1
				%inc8 = add nuw nsw i32 %i.0, 1, !dbg !66
				call void @llvm.dbg.value(metadata i32 %inc8, metadata !46, metadata !DIExpression()), !dbg !52
				br label %for.cond, !dbg !68, !llvm.loop !69

				for.body4: ; preds = %for.cond1
				%0 = load volatile i32, ptr @x, align 4, !dbg !73, !tbaa !22
				%call = call i32 @matched(i32 noundef %0), !dbg !75
				store volatile i32 %call, ptr @x, align 4, !dbg !76, !tbaa !22
				%1 = load volatile i32, ptr @x, align 4, !dbg !77, !tbaa !22
				%call5 = call i32 @foo(i32 noundef %1), !dbg !78
				store volatile i32 %call5, ptr @x, align 4, !dbg !79, !tbaa !22
				%2 = load volatile i32, ptr @x, align 4, !dbg !80, !tbaa !22
				%call6 = call i32 @bar(i32 noundef %2), !dbg !81
				store volatile i32 %call6, ptr @x, align 4, !dbg !82, !tbaa !22
				%inc = add nuw nsw i32 %a.0, 1, !dbg !83
				call void @llvm.dbg.value(metadata i32 %inc, metadata !48, metadata !DIExpression()), !dbg !59
				br label %for.cond1, !dbg !85, !llvm.loop !86
				}

				; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
				declare void @llvm.dbg.value(metadata, metadata, metadata) #4

				attributes #0 = { nounwind uwtable "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "use-sample-profile" }
				attributes #1 = { mustprogress nocallback nofree nosync nounwind readnone speculatable willreturn }
				attributes #2 = { argmemonly mustprogress nocallback nofree nosync nounwind willreturn }
				attributes #3 = { noinline nounwind uwtable "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "use-sample-profile" }
				attributes #4 = { nocallback nofree nosync nounwind readnone speculatable willreturn }

				!llvm.dbg.cu = !{!2}
				!llvm.module.flags = !{!7, !8, !9, !10}
				!llvm.ident = !{!11}

				!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
				!1 = distinct !DIGlobalVariable(name: "x", scope: !2, file: !3, line: 1, type: !5, isLocal: false, isDefinition: true)
				!2 = distinct !DICompileUnit(language: DW_LANG_C99, file: !3, producer: "clang", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, globals: !4, splitDebugInlining: false, debugInfoForProfiling: true, nameTableKind: None)
				!3 = !DIFile(filename: "test.c", directory: "test")
				!4 = !{!0}
				!5 = !DIDerivedType(tag: DW_TAG_volatile_type, baseType: !6)
				!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
				!7 = !{i32 7, !"Dwarf Version", i32 5}
				!8 = !{i32 2, !"Debug Info Version", i32 3}
				!9 = !{i32 1, !"wchar_size", i32 4}
				!10 = !{i32 7, !"uwtable", i32 2}
				!11 = !{!""}
				!12 = distinct !DISubprogram(name: "foo", scope: !3, file: !3, line: 2, type: !13, scopeLine: 2, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2, retainedNodes: !15)
				!13 = !DISubroutineType(types: !14)
				!14 = !{!6, !6}
				!15 = !{!16, !17}
				!16 = !DILocalVariable(name: "x", arg: 1, scope: !12, file: !3, line: 2, type: !6)
				!17 = !DILocalVariable(name: "y", scope: !12, file: !3, line: 3, type: !5)
				!18 = !DILocation(line: 0, scope: !12)
				!19 = !DILocation(line: 3, column: 3, scope: !12)
				!20 = !DILocation(line: 3, column: 16, scope: !12)
				!21 = !DILocation(line: 3, column: 22, scope: !12)
				!22 = !{!23, !23, i64 0}
				!23 = !{!"int", !24, i64 0}
				!24 = !{!"omnipotent char", !25, i64 0}
				!25 = !{!"Simple C/C++ TBAA"}
				!26 = !DILocation(line: 4, column: 10, scope: !12)
				!27 = !DILocation(line: 4, column: 12, scope: !12)
				!28 = !DILocation(line: 5, column: 1, scope: !12)
				!29 = !DILocation(line: 4, column: 3, scope: !12)
				!30 = distinct !DISubprogram(name: "bar", scope: !3, file: !3, line: 7, type: !13, scopeLine: 7, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2, retainedNodes: !31)
				!31 = !{!32}
				!32 = !DILocalVariable(name: "x", arg: 1, scope: !30, file: !3, line: 7, type: !6)
				!33 = !DILocation(line: 0, scope: !30)
				!34 = !DILocation(line: 8, column: 12, scope: !30)
				!35 = !DILocation(line: 8, column: 3, scope: !30)
				!36 = distinct !DISubprogram(name: "matched", scope: !3, file: !3, line: 11, type: !13, scopeLine: 11, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2, retainedNodes: !37)
				!37 = !{!38}
				!38 = !DILocalVariable(name: "x", arg: 1, scope: !36, file: !3, line: 11, type: !6)
				!39 = !DILocation(line: 0, scope: !36)
				!40 = !DILocation(line: 12, column: 12, scope: !36)
				!41 = !DILocation(line: 12, column: 3, scope: !36)
				!42 = distinct !DISubprogram(name: "main", scope: !3, file: !3, line: 15, type: !43, scopeLine: 15, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2, retainedNodes: !45)
				!43 = !DISubroutineType(types: !44)
				!44 = !{!6}
				!45 = !{!46, !48}
				!46 = !DILocalVariable(name: "i", scope: !47, file: !3, line: 16, type: !6)
				!47 = distinct !DILexicalBlock(scope: !42, file: !3, line: 16, column: 3)
				!48 = !DILocalVariable(name: "a", scope: !49, file: !3, line: 17, type: !6)
				!49 = distinct !DILexicalBlock(scope: !50, file: !3, line: 17, column: 5)
				!50 = distinct !DILexicalBlock(scope: !51, file: !3, line: 16, column: 34)
				!51 = distinct !DILexicalBlock(scope: !47, file: !3, line: 16, column: 3)
				!52 = !DILocation(line: 0, scope: !47)
				!53 = !DILocation(line: 16, column: 8, scope: !47)
				!54 = !DILocation(line: 16, column: 21, scope: !55)
				!55 = !DILexicalBlockFile(scope: !51, file: !3, discriminator: 2)
				!56 = !DILocation(line: 16, column: 3, scope: !57)
				!57 = !DILexicalBlockFile(scope: !47, file: !3, discriminator: 2)
				!58 = !DILocation(line: 23, column: 1, scope: !42)
				!59 = !DILocation(line: 0, scope: !49)
				!60 = !DILocation(line: 17, column: 10, scope: !49)
				!61 = !DILocation(line: 17, column: 23, scope: !62)
				!62 = !DILexicalBlockFile(scope: !63, file: !3, discriminator: 2)
				!63 = distinct !DILexicalBlock(scope: !49, file: !3, line: 17, column: 5)
				!64 = !DILocation(line: 17, column: 5, scope: !65)
				!65 = !DILexicalBlockFile(scope: !49, file: !3, discriminator: 2)
				!66 = !DILocation(line: 16, column: 30, scope: !67)
				!67 = !DILexicalBlockFile(scope: !51, file: !3, discriminator: 4)
				!68 = !DILocation(line: 16, column: 3, scope: !67)
				!69 = distinct !{!69, !70, !71, !72}
				!70 = !DILocation(line: 16, column: 3, scope: !47)
				!71 = !DILocation(line: 22, column: 3, scope: !47)
				!72 = !{!"llvm.loop.mustprogress"}
				!73 = !DILocation(line: 18, column: 19, scope: !74)
				!74 = distinct !DILexicalBlock(scope: !63, file: !3, line: 17, column: 37)
				!75 = !DILocation(line: 18, column: 11, scope: !74)
				!76 = !DILocation(line: 18, column: 9, scope: !74)
				!77 = !DILocation(line: 19, column: 15, scope: !74)
				!78 = !DILocation(line: 19, column: 11, scope: !74)
				!79 = !DILocation(line: 19, column: 9, scope: !74)
				!80 = !DILocation(line: 20, column: 15, scope: !74)
				!81 = !DILocation(line: 20, column: 11, scope: !74)
				!82 = !DILocation(line: 20, column: 9, scope: !74)
				!83 = !DILocation(line: 17, column: 33, scope: !84)
				!84 = !DILexicalBlockFile(scope: !63, file: !3, discriminator: 4)
				!85 = !DILocation(line: 17, column: 5, scope: !84)
				!86 = distinct !{!86, !87, !88, !72}
				!87 = !DILocation(line: 17, column: 5, scope: !49)
				!88 = !DILocation(line: 21, column: 5, scope: !49)

llvm/test/Transforms/SampleProfile/pseudo-probe-profile-mismatch.ll

This file was added.

				; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/pseudo-probe-profile-mismatch.prof -report-profile-staleness -S 2>%t
				; RUN: FileCheck %s --input-file %t

				; CHECK: (1/3) of functions' profile are invalid and (10/50) of samples are discarded due to function hash mismatch.
				; CHECK: (2/3) of callsites' profile are invalid and (20/30) of samples are discarded due to callsite location mismatch.

				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@x = dso_local global i32 0, align 4, !dbg !0

				; Function Attrs: nounwind uwtable
				define dso_local i32 @foo(i32 noundef %x) #0 !dbg !16 {
				entry:
				%y = alloca i32, align 4
				call void @llvm.dbg.value(metadata i32 %x, metadata !20, metadata !DIExpression()), !dbg !22
				call void @llvm.lifetime.start.p0(i64 4, ptr nonnull %y), !dbg !23
				call void @llvm.dbg.declare(metadata ptr %y, metadata !21, metadata !DIExpression()), !dbg !24
				call void @llvm.pseudoprobe(i64 6699318081062747564, i64 1, i32 0, i64 -1), !dbg !25
				%add = add nsw i32 %x, 1, !dbg !26
				store volatile i32 %add, ptr %y, align 4, !dbg !24, !tbaa !27
				%y.0. = load volatile i32, ptr %y, align 4, !dbg !31, !tbaa !27
				%add1 = add nsw i32 %y.0., 1, !dbg !32
				call void @llvm.lifetime.end.p0(i64 4, ptr nonnull %y), !dbg !33
				ret i32 %add1, !dbg !34
				}

				; Function Attrs: mustprogress nocallback nofree nosync nounwind readnone speculatable willreturn
				declare void @llvm.dbg.declare(metadata, metadata, metadata) #1

				; Function Attrs: argmemonly mustprogress nocallback nofree nosync nounwind willreturn
				declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #2

				; Function Attrs: argmemonly mustprogress nocallback nofree nosync nounwind willreturn
				declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #2

				; Function Attrs: noinline nounwind uwtable
				define dso_local i32 @bar(i32 noundef %x) #3 !dbg !35 {
				entry:
				call void @llvm.dbg.value(metadata i32 %x, metadata !37, metadata !DIExpression()), !dbg !38
				call void @llvm.pseudoprobe(i64 -2012135647395072713, i64 1, i32 0, i64 -1), !dbg !39
				%add = add nsw i32 %x, 2, !dbg !40
				ret i32 %add, !dbg !41
				}

				; Function Attrs: noinline nounwind uwtable
				define dso_local i32 @matched(i32 noundef %x) #3 !dbg !42 {
				entry:
				call void @llvm.dbg.value(metadata i32 %x, metadata !44, metadata !DIExpression()), !dbg !45
				call void @llvm.pseudoprobe(i64 -5844448289301669773, i64 1, i32 0, i64 -1), !dbg !46
				%add = add nsw i32 %x, 3, !dbg !47
				ret i32 %add, !dbg !48
				}

				; Function Attrs: nounwind uwtable
				define dso_local i32 @main() #0 !dbg !49 {
				entry:
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 1, i32 0, i64 -1), !dbg !59
				call void @llvm.dbg.value(metadata i32 0, metadata !53, metadata !DIExpression()), !dbg !60
				br label %for.cond, !dbg !61

				for.cond: ; preds = %for.cond.cleanup3, %entry
				%i.0 = phi i32 [ 0, %entry ], [ %inc8, %for.cond.cleanup3 ], !dbg !60
				call void @llvm.dbg.value(metadata i32 %i.0, metadata !53, metadata !DIExpression()), !dbg !60
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 2, i32 0, i64 -1), !dbg !62
				%cmp = icmp ult i32 %i.0, 1000, !dbg !64
				br i1 %cmp, label %for.body, label %for.cond.cleanup, !dbg !65

				for.cond.cleanup: ; preds = %for.cond
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 3, i32 0, i64 -1), !dbg !67
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 11, i32 0, i64 -1), !dbg !68
				ret i32 0, !dbg !68

				for.body: ; preds = %for.cond
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 4, i32 0, i64 -1), !dbg !69
				call void @llvm.dbg.value(metadata i32 0, metadata !55, metadata !DIExpression()), !dbg !70
				br label %for.cond1, !dbg !71

				for.cond1: ; preds = %for.body4, %for.body
				%a.0 = phi i32 [ 0, %for.body ], [ %inc, %for.body4 ], !dbg !70
				call void @llvm.dbg.value(metadata i32 %a.0, metadata !55, metadata !DIExpression()), !dbg !70
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 5, i32 0, i64 -1), !dbg !72
				%cmp2 = icmp ult i32 %a.0, 10000, !dbg !75
				br i1 %cmp2, label %for.body4, label %for.cond.cleanup3, !dbg !76

				for.cond.cleanup3: ; preds = %for.cond1
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 6, i32 0, i64 -1), !dbg !67
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 9, i32 0, i64 -1), !dbg !78
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 10, i32 0, i64 -1), !dbg !79
				%inc8 = add nuw nsw i32 %i.0, 1, !dbg !79
				call void @llvm.dbg.value(metadata i32 %inc8, metadata !53, metadata !DIExpression()), !dbg !60
				br label %for.cond, !dbg !81, !llvm.loop !82

				for.body4: ; preds = %for.cond1
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 7, i32 0, i64 -1), !dbg !86
				%0 = load volatile i32, ptr @x, align 4, !dbg !86, !tbaa !27
				%call = call i32 @matched(i32 noundef %0), !dbg !88
				store volatile i32 %call, ptr @x, align 4, !dbg !90, !tbaa !27
				%1 = load volatile i32, ptr @x, align 4, !dbg !91, !tbaa !27
				%call5 = call i32 @foo(i32 noundef %1), !dbg !92
				store volatile i32 %call5, ptr @x, align 4, !dbg !94, !tbaa !27
				%2 = load volatile i32, ptr @x, align 4, !dbg !95, !tbaa !27
				%call6 = call i32 @bar(i32 noundef %2), !dbg !96
				store volatile i32 %call6, ptr @x, align 4, !dbg !98, !tbaa !27
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 8, i32 0, i64 -1), !dbg !99
				%inc = add nuw nsw i32 %a.0, 1, !dbg !99
				call void @llvm.dbg.value(metadata i32 %inc, metadata !55, metadata !DIExpression()), !dbg !70
				br label %for.cond1, !dbg !101, !llvm.loop !102
				}

				; Function Attrs: inaccessiblememonly mustprogress nocallback nofree nosync nounwind willreturn
				declare void @llvm.pseudoprobe(i64, i64, i32, i64) #4

				; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
				declare void @llvm.dbg.value(metadata, metadata, metadata) #5

				attributes #0 = { nounwind uwtable "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "use-sample-profile" }
				attributes #1 = { mustprogress nocallback nofree nosync nounwind readnone speculatable willreturn }
				attributes #2 = { argmemonly mustprogress nocallback nofree nosync nounwind willreturn }
				attributes #3 = { noinline nounwind uwtable "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "use-sample-profile" }
				attributes #4 = { inaccessiblememonly mustprogress nocallback nofree nosync nounwind willreturn }
				attributes #5 = { nocallback nofree nosync nounwind readnone speculatable willreturn }

				!llvm.dbg.cu = !{!2}
				!llvm.module.flags = !{!7, !8, !9, !10}
				!llvm.ident = !{!11}
				!llvm.pseudo_probe_desc = !{!12, !13, !14, !15}

				!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
				!1 = distinct !DIGlobalVariable(name: "x", scope: !2, file: !3, line: 1, type: !5, isLocal: false, isDefinition: true)
				!2 = distinct !DICompileUnit(language: DW_LANG_C99, file: !3, producer: "", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, globals: !4, splitDebugInlining: false, debugInfoForProfiling: true, nameTableKind: None)
				!3 = !DIFile(filename: "test.c", directory: "")
				!4 = !{!0}
				!5 = !DIDerivedType(tag: DW_TAG_volatile_type, baseType: !6)
				!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
				!7 = !{i32 7, !"Dwarf Version", i32 5}
				!8 = !{i32 2, !"Debug Info Version", i32 3}
				!9 = !{i32 1, !"wchar_size", i32 4}
				!10 = !{i32 7, !"uwtable", i32 2}
				!11 = !{!""}
				!12 = !{i64 6699318081062747564, i64 4294967295, !"foo"}
				!13 = !{i64 -2012135647395072713, i64 4294967295, !"bar"}
				!14 = !{i64 -5844448289301669773, i64 4294967295, !"matched"}
				!15 = !{i64 -2624081020897602054, i64 844635331715433, !"main"}
				!16 = distinct !DISubprogram(name: "foo", scope: !3, file: !3, line: 2, type: !17, scopeLine: 2, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2, retainedNodes: !19)
				!17 = !DISubroutineType(types: !18)
				!18 = !{!6, !6}
				!19 = !{!20, !21}
				!20 = !DILocalVariable(name: "x", arg: 1, scope: !16, file: !3, line: 2, type: !6)
				!21 = !DILocalVariable(name: "y", scope: !16, file: !3, line: 3, type: !5)
				!22 = !DILocation(line: 0, scope: !16)
				!23 = !DILocation(line: 3, column: 3, scope: !16)
				!24 = !DILocation(line: 3, column: 16, scope: !16)
				!25 = !DILocation(line: 3, column: 20, scope: !16)
				!26 = !DILocation(line: 3, column: 22, scope: !16)
				!27 = !{!28, !28, i64 0}
				!28 = !{!"int", !29, i64 0}
				!29 = !{!"omnipotent char", !30, i64 0}
				!30 = !{!"Simple C/C++ TBAA"}
				!31 = !DILocation(line: 4, column: 10, scope: !16)
				!32 = !DILocation(line: 4, column: 12, scope: !16)
				!33 = !DILocation(line: 5, column: 1, scope: !16)
				!34 = !DILocation(line: 4, column: 3, scope: !16)
				!35 = distinct !DISubprogram(name: "bar", scope: !3, file: !3, line: 7, type: !17, scopeLine: 7, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2, retainedNodes: !36)
				!36 = !{!37}
				!37 = !DILocalVariable(name: "x", arg: 1, scope: !35, file: !3, line: 7, type: !6)
				!38 = !DILocation(line: 0, scope: !35)
				!39 = !DILocation(line: 8, column: 10, scope: !35)
				!40 = !DILocation(line: 8, column: 12, scope: !35)
				!41 = !DILocation(line: 8, column: 3, scope: !35)
				!42 = distinct !DISubprogram(name: "matched", scope: !3, file: !3, line: 11, type: !17, scopeLine: 11, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2, retainedNodes: !43)
				!43 = !{!44}
				!44 = !DILocalVariable(name: "x", arg: 1, scope: !42, file: !3, line: 11, type: !6)
				!45 = !DILocation(line: 0, scope: !42)
				!46 = !DILocation(line: 12, column: 10, scope: !42)
				!47 = !DILocation(line: 12, column: 12, scope: !42)
				!48 = !DILocation(line: 12, column: 3, scope: !42)
				!49 = distinct !DISubprogram(name: "main", scope: !3, file: !3, line: 15, type: !50, scopeLine: 15, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2, retainedNodes: !52)
				!50 = !DISubroutineType(types: !51)
				!51 = !{!6}
				!52 = !{!53, !55}
				!53 = !DILocalVariable(name: "i", scope: !54, file: !3, line: 16, type: !6)
				!54 = distinct !DILexicalBlock(scope: !49, file: !3, line: 16, column: 3)
				!55 = !DILocalVariable(name: "a", scope: !56, file: !3, line: 17, type: !6)
				!56 = distinct !DILexicalBlock(scope: !57, file: !3, line: 17, column: 5)
				!57 = distinct !DILexicalBlock(scope: !58, file: !3, line: 16, column: 34)
				!58 = distinct !DILexicalBlock(scope: !54, file: !3, line: 16, column: 3)
				!59 = !DILocation(line: 16, column: 12, scope: !54)
				!60 = !DILocation(line: 0, scope: !54)
				!61 = !DILocation(line: 16, column: 8, scope: !54)
				!62 = !DILocation(line: 16, column: 19, scope: !63)
				!63 = !DILexicalBlockFile(scope: !58, file: !3, discriminator: 2)
				!64 = !DILocation(line: 16, column: 21, scope: !63)
				!65 = !DILocation(line: 16, column: 3, scope: !66)
				!66 = !DILexicalBlockFile(scope: !54, file: !3, discriminator: 2)
				!67 = !DILocation(line: 0, scope: !49)
				!68 = !DILocation(line: 23, column: 1, scope: !49)
				!69 = !DILocation(line: 17, column: 14, scope: !56)
				!70 = !DILocation(line: 0, scope: !56)
				!71 = !DILocation(line: 17, column: 10, scope: !56)
				!72 = !DILocation(line: 17, column: 21, scope: !73)
				!73 = !DILexicalBlockFile(scope: !74, file: !3, discriminator: 2)
				!74 = distinct !DILexicalBlock(scope: !56, file: !3, line: 17, column: 5)
				!75 = !DILocation(line: 17, column: 23, scope: !73)
				!76 = !DILocation(line: 17, column: 5, scope: !77)
				!77 = !DILexicalBlockFile(scope: !56, file: !3, discriminator: 2)
				!78 = !DILocation(line: 22, column: 3, scope: !57)
				!79 = !DILocation(line: 16, column: 30, scope: !80)
				!80 = !DILexicalBlockFile(scope: !58, file: !3, discriminator: 4)
				!81 = !DILocation(line: 16, column: 3, scope: !80)
				!82 = distinct !{!82, !83, !84, !85}
				!83 = !DILocation(line: 16, column: 3, scope: !54)
				!84 = !DILocation(line: 22, column: 3, scope: !54)
				!85 = !{!"llvm.loop.mustprogress"}
				!86 = !DILocation(line: 18, column: 19, scope: !87)
				!87 = distinct !DILexicalBlock(scope: !74, file: !3, line: 17, column: 37)
				!88 = !DILocation(line: 18, column: 11, scope: !89)
				!89 = !DILexicalBlockFile(scope: !87, file: !3, discriminator: 186646631)
				!90 = !DILocation(line: 18, column: 9, scope: !87)
				!91 = !DILocation(line: 19, column: 15, scope: !87)
				!92 = !DILocation(line: 19, column: 11, scope: !93)
				!93 = !DILexicalBlockFile(scope: !87, file: !3, discriminator: 186646639)
				!94 = !DILocation(line: 19, column: 9, scope: !87)
				!95 = !DILocation(line: 20, column: 15, scope: !87)
				!96 = !DILocation(line: 20, column: 11, scope: !97)
				!97 = !DILexicalBlockFile(scope: !87, file: !3, discriminator: 186646647)
				!98 = !DILocation(line: 20, column: 9, scope: !87)
				!99 = !DILocation(line: 17, column: 33, scope: !100)
				!100 = !DILexicalBlockFile(scope: !74, file: !3, discriminator: 4)
				!101 = !DILocation(line: 17, column: 5, scope: !100)
				!102 = distinct !{!102, !103, !104, !85}
				!103 = !DILocation(line: 17, column: 5, scope: !56)
				!104 = !DILocation(line: 21, column: 5, scope: !56)

This is an archive of the discontinued LLVM Phabricator instance.

[SampleFDO] Compute and report profile staleness metricsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 471011

llvm/include/llvm/ProfileData/SampleProf.h

llvm/lib/Transforms/IPO/SampleProfile.cpp

llvm/test/Transforms/SampleProfile/Inputs/profile-mismatch.prof

llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-profile-mismatch.prof

llvm/test/Transforms/SampleProfile/profile-mismatch.ll

llvm/test/Transforms/SampleProfile/pseudo-probe-profile-mismatch.ll

[SampleFDO] Compute and report profile staleness metrics
ClosedPublic