This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Frontend/
-
Frontend/
-
CompilerInvocation.cpp
-
test/CodeGen/
-
CodeGen/
-
Inputs/
-
memprof.exe
-
memprof.memprofraw
2/2
memprof.cpp
-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
-
MemoryBuiltins.h
-
ProfileData/
2/2
InstrProfReader.h
-
lib/
-
Analysis/
1/1
MemoryBuiltins.cpp
-
Transforms/Instrumentation/
-
Instrumentation/
20/25
PGOInstrumentation.cpp
-
test/Transforms/PGOProfile/
-
Transforms/
-
PGOProfile/
-
Inputs/
-
memprof.exe
-
memprof.memprofraw
-
memprof_pgo.profraw
7/7
memprof.ll
1/1
memprofmissingfunc.ll

Differential D128142

[MemProf] Memprof profile matching and annotation
ClosedPublic

Authored by tejohnson on Jun 19 2022, 10:18 AM.

Download Raw Diff

Details

Reviewers

snehasish
davidxl

Commits

rGf9403ca41e5f: Profile matching and IR annotation for memprof profiles.
rGb1926f308f09: Restore "[MemProf] Memprof profile matching and annotation"
rGa212d8da94d0: [MemProf] Memprof profile matching and annotation

Summary

Profile matching and IR annotation for memprof profiles.

See also related RFCs:
RFC: Sanitizer-based Heap Profiler [1]
RFC: A binary serialization format for MemProf [2]
RFC: IR metadata format for MemProf [3]*
* Note that the IR metadata format has changed from the RFC during
implementation, as described in the preceeding patch adding the basic
metadata and verification support.

The matching is performed during the normal PGO annotation phase, to
ensure that the inlines applied in the IR at that point are a subset
of the inlines in the profiled binary and thus reflected in the
profile's call stacks. This is important because the call frames are
associated with functions in the profile based on the inlining in the
symbolized call stacks, and this simplifies locating the subset of
profile data relevant for matching onto each function's IR.

The PGOInstrumentationUse pass is enhanced to perform matching for
whatever combination of memprof and regular PGO profile data exists in
the profile.

Using the utilities introduced in D128854:
The memprof profile data for each context is converted to "cold" or
"notcold" based on parameterized thresholds for size, access count, and
lifetime. The memprof allocation contexts are trimmed to the minimal
amount of context required to uniquely identify whether the context is
cold or not cold. For allocations where all profiled contexts have the
same allocation type, no memprof metadata is attached and instead the
allocation call is directly annotated with an attribute specifying the
alloction type. This is the same attributed that will be applied to
allocation calls once cloned for different contexts, and later used
during LibCall simplification to emit allocation hints [4].

Depends on D128141 and D128854.

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html
[2] https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html
[3] https://discourse.llvm.org/t/rfc-ir-metadata-format-for-memprof/59165
[4] https://github.com/google/tcmalloc/commit/ab87cf382dc56784f783f3aaa43d6d0465d5f385

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tejohnson created this revision.Jun 19 2022, 10:18 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 19 2022, 10:18 AM

Herald added subscribers: Enna1, wenlei, hiraditya, mgorny. · View Herald Transcript

tejohnson requested review of this revision.Jun 19 2022, 10:18 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 19 2022, 10:18 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B170729: Diff 438201.Jun 19 2022, 10:18 AM

Herald added a subscriber: ormris. · View Herald TranscriptJun 19 2022, 10:18 AM

tejohnson added a child revision: D128143: [MemProf] Update metadata during inlining.Jun 19 2022, 10:19 AM

MTC added a subscriber: MTC.Jun 21 2022, 8:15 PM

I'm still going through PGOInstrumentation.cpp ...

clang/test/CodeGen/memprof.cpp
16	Just `-no-pie` is better (details: https://reviews.llvm.org/rG103b28902fd6)
llvm/include/llvm/Analysis/MemoryProfileInfo.h
1 ↗	(On Diff #438201)	Can you split out the memory profile info parts into a separate patch? Also I think the interface is simple enough to be able to unit test. Something set up like [1] would be great. Then we can call addCallstack with different annotations followed by buildAndAttachMIBMetaData followed by checking the metadata annotated on the call inst(s). What do you think? [1] https://github.com/llvm/llvm-project/blob/3f8e4169c1c390fd086658c1e51983ee61bff9bc/llvm/unittests/Analysis/FunctionPropertiesAnalysisTest.cpp#L71
40 ↗	(On Diff #438201)	Should this be a doxygen comment block with `///`?
llvm/include/llvm/ProfileData/InstrProfReader.h
333	The RawInstrProfReader shouldn't have the memprof mask set. We have a separate raw binary format which is independent. So this should always return false. Also maybe add a comment to document the fact?
llvm/lib/Analysis/MemoryProfileInfo.cpp
19 ↗	(On Diff #438201)	We use MemoryProfile and MemProf interchangeably. Does it make sense to pick one and make it consistent throughout? Here for eg. the flags begin with "memprof-" but the debug type is "memory-profile-".
102 ↗	(On Diff #438201)	nit: prefer static_cast<uint8_t> here and elsewhere.

tejohnson marked 3 inline comments as done.Jun 29 2022, 1:44 PM

tejohnson added inline comments.

llvm/include/llvm/Analysis/MemoryProfileInfo.h
1 ↗	(On Diff #438201)	See D128854. I'll try to rebase this and the follow on inliner patch on top of that when I get a chance.
40 ↗	(On Diff #438201)	Fixed in new patch.
llvm/lib/Analysis/MemoryProfileInfo.cpp
19 ↗	(On Diff #438201)	The clang option is -fmemory-profile for instrumentation, so I've used that some places (e.g. the file names too) for clarity. MemProf is a nice short hand and used in the metadata. I don't have a strong opinion about which name should be used where. I've kept this as is for now in the new patch, let me know what your thoughts are on what is clearer.
102 ↗	(On Diff #438201)	Fixed in new patch.

Rebase on top of D128854 which now includes the extracted Analysis utilities.
I have not yet addressed the other comments on this patch.

tejohnson edited the summary of this revision. (Show Details)Jun 30 2022, 2:59 PM

tejohnson added a parent revision: D128854: [MemProf] Add memprof metadata related analysis utilities.

Harbormaster completed remote builds in B173129: Diff 441529.Jun 30 2022, 3:09 PM

Enna1 added inline comments.Jul 1 2022, 3:05 AM

llvm/lib/Analysis/MemoryBuiltins.cpp
320–322	nit: place the definition of `llvm::isNewLikeFn` just after `llvm::isAllocationFn(const Value *V, function_ref<const TargetLibraryInfo &(Function &)> GetTLI)` ?

tejohnson mentioned this in D128854: [MemProf] Add memprof metadata related analysis utilities.Jul 21 2022, 12:06 PM

Address comments

clang/test/CodeGen/memprof.cpp
16	Fixed, here and elsewhere
llvm/include/llvm/ProfileData/InstrProfReader.h
333	Also added an assert.

Harbormaster completed remote builds in B177117: Diff 446986.Jul 22 2022, 4:06 PM

snehasish added inline comments.Jul 25 2022, 4:10 PM

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1255	Prefer moving this code after the loop, close to where AllocType is used.
1258	I think if you use an llvm::SetVector here instead then you don't need the StackHashSet std::set below. CallstackTrie::addCallstack already accepts an ArrayRef so it won't need to change if we use a SetVector.
1259	nit: It doesn't look like we #include <set> in this file so we are probably relying on having it transitively being included from somewhere.
1282	Consider defining the lambda outside above the condition to reduce indentation. IMO it will be a little easier to follow if it wasn't inlined into the if statement itself.
1303	Is an llvm::Twine a better choice here instead of std::string? I guess it doesn't matter much in error handling code.
1319	LocHashToCallSiteFrame to indicate the value in the map corresponds to an individual frame?
1325	Not using auto over here would be helpful to know that we are indexing into the map below using an uint64_t. Same below.
1336	Should we assert that it was actually found?
1366	Can you add an assert for this?
1371	`DIL != nullptr` is a little easier to follow.
1403	Prefer moving this to a static helper method to reduce the size of the loop body, reduce indentation for this logic and make it more readable overall. Probably creating an functor object on the stack for each instruction that we process is not efficient either.
1420	"First add !memprof metadata ..." -- the ordering of the if-else condition isn't necessary though since only one of the iters can be non-null? We could rewrite the else condition first to reduce the complexity here a bit. Eg -- if (CallSitesIter != LocHashToCallSites.end()) { ... continue } // Flip the conditions here if (!isNewLikeFn() \|\| AllocInfoIter == LocHashToAllocInfo.end()) { continue } CallStackTrie AllocTrie; ...
llvm/test/Transforms/PGOProfile/memprof.ll
83	./pgo.exe
96	--check-prefixes=MEMPROF,ALL can be used instead.
109	I suspect that the check lines are redundant. I think FileCheck scans the entire file and groups conditions by prefix. So we could have the 3 run lines followed by a group of prefix checks. ; ALL-NOT: memprof record not found for function hash ; ALL-NOT: no profile data available for function ; MEMPROF-NOT: !prof ; PGOONLY-NOT: !memprof ; PGOONLY-NOT: !callsite
llvm/test/Transforms/PGOProfile/memprofmissingfunc.ll
14	Should we use a regex here to make it more resilient since we don't care about the exact hash?

Herald added a subscriber: mingmingl. · View Herald TranscriptJul 25 2022, 4:10 PM

mingmingl removed a subscriber: mingmingl.Jul 25 2022, 4:11 PM

tejohnson marked 12 inline comments as done.Aug 30 2022, 10:15 AM

tejohnson added inline comments.

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1258	Noted, but moot as this has been removed (see below)
1259	Although this std::set has been removed, I use it elsewhere so added the include.
1262	I have removed this handling. It is not correct in the case of mutual recursion involving more than one function where it will result in non-sensical stack traces. I am deferring handling recursion until later during the LTO phase.
1282	I could do this, but as is it mirrors the structure of the similar handling in readCounters, which has some advantages. wdyt?
1319	The key is the location hash (stack id) of a single frame, but the value is a set of all the CallSites from the profile that reference it.
1336	Added assert after the loop.
1366	In this case "may only" meant "might only", not "may only at most". So I can't assert on anything. This can happen for example if we have a location that corresponds to both an allocation call and another callsite (I've seen this periodically, and can reproduce e.g. with a macro). We would need to use discriminators more widely to better distinguish them in that case (with the handling here we will only match to the allocation call for now - edit: a slight change noted further below ensures this is the case). Will change /may/might/ and add a note.
1420	As noted earlier, it might be in more than one map. But I realized we could sometimes add the callsite metadata, instead of the memprof metadata, to a non-new allocation call (e.g. malloc) when there is a matching location in both maps given the structuring of the handling below. I've changed it so we handle all instructions with matching allocation profile data in the below if statement, and skip adding any metadata if there is matching allocation profile data but it is not isNewLikeFn. I've made the allocation profile matching if statement below have a continue at the end, so that I can remove the indentation further below for the callsite-only matched situation and added an assert there.
llvm/test/Transforms/PGOProfile/memprof.ll
96	Done here and elsewhere.
109	It doesn't group them by prefix, but you are right there are a lot of redundant checks in this test. And one that is not correct (see below). I have cleaned this up
116	The comment here and below and one of the checks is incorrect. This case is testing a pgo+memprof profile.
124	This is not correct for this test. But worked because it matched the first !prof against the first PGO label below after ALL-LABEL. This was just uselessly checking that there were no additional !prof before that. I have removed this and made the earlier similar check a MEMPROFONLY check.

Address comments

Harbormaster completed remote builds in B184188: Diff 456710.Aug 30 2022, 10:16 AM

lgtm

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1282	I wasn't a big fan of the existing structure in readCounters but I didn't want to ask you to change the other code. Let's leave it as is for now.
1366	Thanks for the explanation.

This revision is now accepted and ready to land.Sep 14 2022, 8:58 AM

This revision was landed with ongoing or failed builds.Sep 22 2022, 12:49 PM

Closed by commit rGa212d8da94d0: [MemProf] Memprof profile matching and annotation (authored by tejohnson). · Explain Why

This revision was automatically updated to reflect the committed changes.

tejohnson added a commit: rGa212d8da94d0: [MemProf] Memprof profile matching and annotation.

tejohnson added a reverting change: rG794b7ea960cc: Revert "[MemProf] Memprof profile matching and annotation".Sep 22 2022, 4:08 PM

MaskRay added a subscriber: MaskRay.Sep 22 2022, 6:29 PM

MaskRay added inline comments.

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1247	You may use BLAKE3 instead of MD5. BLAKE3 is much faster than LLVM's slow MD5 implementation.

MaskRay added inline comments.Sep 22 2022, 6:30 PM

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1247	llvm/include/llvm/Support/xxhash.h is also a good choice.

tejohnson marked an inline comment as done.Sep 23 2022, 11:29 AM

tejohnson added inline comments.

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1247	Thanks for the tip! I used BLAKE3.

tejohnson added a commit: rGb1926f308f09: Restore "[MemProf] Memprof profile matching and annotation".Sep 23 2022, 11:38 AM

tejohnson added a commit: rGf9403ca41e5f: Profile matching and IR annotation for memprof profiles..Sep 30 2022, 4:46 PM

I see that this revision is reverted along with reversion of D128143, I see that D128143 is re-committed.
The commit message of D128143 says that it depends on this patch. Does the dependency still exist?

eopXD added a comment.Oct 4 2022, 7:05 PM

This comment was removed by eopXD.

Revision Contents

Path

Size

clang/

lib/

Frontend/

CompilerInvocation.cpp

5 lines

test/

CodeGen/

Inputs/

memprof.exe

memprof.memprofraw

memprof.cpp

35 lines

llvm/

include/

llvm/

Analysis/

MemoryBuiltins.h

4 lines

ProfileData/

InstrProfReader.h

21 lines

lib/

Analysis/

MemoryBuiltins.cpp

6 lines

Transforms/

Instrumentation/

PGOInstrumentation.cpp

277 lines

test/

Transforms/

PGOProfile/

Inputs/

483 lines

memprofmissingfunc.ll

25 lines

Diff 462269

clang/lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 1,300 Lines • ▼ Show 20 Lines	unsigned DiagID = Diags.getCustomDiagID(DiagnosticsEngine::Error,
"Error in reading profile %0: %1");		"Error in reading profile %0: %1");
llvm::handleAllErrors(std::move(E), [&](const llvm::ErrorInfoBase &EI) {		llvm::handleAllErrors(std::move(E), [&](const llvm::ErrorInfoBase &EI) {
Diags.Report(DiagID) << ProfileName.str() << EI.message();		Diags.Report(DiagID) << ProfileName.str() << EI.message();
});		});
return;		return;
}		}
std::unique_ptr<llvm::IndexedInstrProfReader> PGOReader =		std::unique_ptr<llvm::IndexedInstrProfReader> PGOReader =
std::move(ReaderOrErr.get());		std::move(ReaderOrErr.get());
if (PGOReader->isIRLevelProfile()) {		// Currently memprof profiles are only added at the IR level. Mark the profile
		// type as IR in that case as well and the subsequent matching needs to detect
		// which is available (might be one or both).
		if (PGOReader->isIRLevelProfile() \|\| PGOReader->hasMemoryProfile()) {
if (PGOReader->hasCSIRLevelProfile())		if (PGOReader->hasCSIRLevelProfile())
Opts.setProfileUse(CodeGenOptions::ProfileCSIRInstr);		Opts.setProfileUse(CodeGenOptions::ProfileCSIRInstr);
else		else
Opts.setProfileUse(CodeGenOptions::ProfileIRInstr);		Opts.setProfileUse(CodeGenOptions::ProfileIRInstr);
} else		} else
Opts.setProfileUse(CodeGenOptions::ProfileClangInstr);		Opts.setProfileUse(CodeGenOptions::ProfileClangInstr);
}		}

▲ Show 20 Lines • Show All 3,418 Lines • Show Last 20 Lines

clang/test/CodeGen/Inputs/memprof.exe

This binary file was added.

Property	Old Value	New Value
File Mode	null	100755

clang/test/CodeGen/Inputs/memprof.memprofraw

This binary file was added.

clang/test/CodeGen/memprof.cpp

This file was added.

				// Test if memprof instrumentation and use pass are invoked.
				//
				// Instrumentation:
				// Ensure Pass MemProfilerPass and ModuleMemProfilerPass are invoked.
				// RUN: %clang_cc1 -O2 -fmemory-profile %s -fdebug-pass-manager -emit-llvm -o - 2>&1 \| FileCheck %s -check-prefix=INSTRUMENT
				// INSTRUMENT: Running pass: MemProfilerPass on main
				// INSTRUMENT: Running pass: ModuleMemProfilerPass on [module]

				// TODO: Use text profile inputs once that is available for memprof.
				//
				// The following commands were used to compile the source to instrumented
				// executables and collect raw binary format profiles:
				//
				// # Collect memory profile:
				// $ clang++ -fuse-ld=lld -no-pie -Wl,--no-rosegment -gmlt \
				// -fdebug-info-for-profiling -mno-omit-leaf-frame-pointer \
				snehasishUnsubmitted Done Reply Inline Actions Just `-no-pie` is better (details: https://reviews.llvm.org/rG103b28902fd6) snehasish: Just `-no-pie` is better (details: https://reviews.llvm.org/rG103b28902fd6)
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions Fixed, here and elsewhere tejohnson: Fixed, here and elsewhere
				// -fno-omit-frame-pointer -fno-optimize-sibling-calls -m64 -Wl,-build-id \
				// memprof.cpp -o memprof.exe -fmemory-profile
				// $ env MEMPROF_OPTIONS=log_path=stdout ./memprof.exe > memprof.memprofraw
				//
				// RUN: llvm-profdata merge %S/Inputs/memprof.memprofraw --profiled-binary %S/Inputs/memprof.exe -o %t.memprofdata

				// Profile use:
				// Ensure Pass PGOInstrumentationUse is invoked with the memprof-only profile.
				// RUN: %clang_cc1 -O2 -fprofile-instrument-use-path=%t.memprofdata %s -fdebug-pass-manager -emit-llvm -o - 2>&1 \| FileCheck %s -check-prefix=USE
				// USE: Running pass: PGOInstrumentationUse on [module]

				char *foo() {
				return new char[10];
				}
				int main() {
				char *a = foo();
				delete[] a;
				return 0;
				}

llvm/include/llvm/Analysis/MemoryBuiltins.h

	Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	/// Tests if a value is a call or invoke to a library function that			/// Tests if a value is a call or invoke to a library function that
	/// allocates or reallocates memory (either malloc, calloc, realloc, or strdup			/// allocates or reallocates memory (either malloc, calloc, realloc, or strdup
	/// like).			/// like).
	bool isAllocationFn(const Value V, const TargetLibraryInfo TLI);			bool isAllocationFn(const Value V, const TargetLibraryInfo TLI);
	bool isAllocationFn(const Value *V,			bool isAllocationFn(const Value *V,
	function_ref<const TargetLibraryInfo &(Function &)> GetTLI);			function_ref<const TargetLibraryInfo &(Function &)> GetTLI);

	/// Tests if a value is a call or invoke to a library function that			/// Tests if a value is a call or invoke to a library function that
				/// allocates memory via new.
				bool isNewLikeFn(const Value V, const TargetLibraryInfo TLI);

				/// Tests if a value is a call or invoke to a library function that
	/// allocates memory similar to malloc or calloc.			/// allocates memory similar to malloc or calloc.
	bool isMallocOrCallocLikeFn(const Value V, const TargetLibraryInfo TLI);			bool isMallocOrCallocLikeFn(const Value V, const TargetLibraryInfo TLI);

	/// Tests if a value is a call or invoke to a library function that			/// Tests if a value is a call or invoke to a library function that
	/// allocates memory (either malloc, calloc, or strdup like).			/// allocates memory (either malloc, calloc, or strdup like).
	bool isAllocLikeFn(const Value V, const TargetLibraryInfo TLI);			bool isAllocLikeFn(const Value V, const TargetLibraryInfo TLI);

	/// Tests if a function is a call or invoke to a library function that			/// Tests if a function is a call or invoke to a library function that
	▲ Show 20 Lines • Show All 246 Lines • Show Last 20 Lines

llvm/include/llvm/ProfileData/InstrProfReader.h

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	public:
virtual bool useDebugInfoCorrelate() const { return false; }		virtual bool useDebugInfoCorrelate() const { return false; }

/// Return true if the profile has single byte counters representing coverage.		/// Return true if the profile has single byte counters representing coverage.
virtual bool hasSingleByteCoverage() const = 0;		virtual bool hasSingleByteCoverage() const = 0;

/// Return true if the profile only instruments function entries.		/// Return true if the profile only instruments function entries.
virtual bool functionEntryOnly() const = 0;		virtual bool functionEntryOnly() const = 0;

		/// Return true if profile includes a memory profile.
		virtual bool hasMemoryProfile() const = 0;

/// Returns a BitsetEnum describing the attributes of the profile. To check		/// Returns a BitsetEnum describing the attributes of the profile. To check
/// individual attributes prefer using the helpers above.		/// individual attributes prefer using the helpers above.
virtual InstrProfKind getProfileKind() const = 0;		virtual InstrProfKind getProfileKind() const = 0;

/// Return the PGO symtab. There are three different readers:		/// Return the PGO symtab. There are three different readers:
/// Raw, Text, and Indexed profile readers. The first two types		/// Raw, Text, and Indexed profile readers. The first two types
/// of readers are used only by llvm-profdata tool, while the indexed		/// of readers are used only by llvm-profdata tool, while the indexed
/// profile reader is also used by llvm-cov tool and the compiler (		/// profile reader is also used by llvm-cov tool and the compiler (
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	public:
bool hasSingleByteCoverage() const override {		bool hasSingleByteCoverage() const override {
return static_cast<bool>(ProfileKind & InstrProfKind::SingleByteCoverage);		return static_cast<bool>(ProfileKind & InstrProfKind::SingleByteCoverage);
}		}

bool functionEntryOnly() const override {		bool functionEntryOnly() const override {
return static_cast<bool>(ProfileKind & InstrProfKind::FunctionEntryOnly);		return static_cast<bool>(ProfileKind & InstrProfKind::FunctionEntryOnly);
}		}

		bool hasMemoryProfile() const override {
		// TODO: Add support for text format memory profiles.
		return false;
		}

InstrProfKind getProfileKind() const override { return ProfileKind; }		InstrProfKind getProfileKind() const override { return ProfileKind; }

/// Read the header.		/// Read the header.
Error readHeader() override;		Error readHeader() override;

/// Read a single record.		/// Read a single record.
Error readNextRecord(NamedInstrProfRecord &Record) override;		Error readNextRecord(NamedInstrProfRecord &Record) override;

▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	public:
bool hasSingleByteCoverage() const override {		bool hasSingleByteCoverage() const override {
return (Version & VARIANT_MASK_BYTE_COVERAGE) != 0;		return (Version & VARIANT_MASK_BYTE_COVERAGE) != 0;
}		}

bool functionEntryOnly() const override {		bool functionEntryOnly() const override {
return (Version & VARIANT_MASK_FUNCTION_ENTRY_ONLY) != 0;		return (Version & VARIANT_MASK_FUNCTION_ENTRY_ONLY) != 0;
}		}

		bool hasMemoryProfile() const override {
		snehasishUnsubmitted Done Reply Inline Actions The RawInstrProfReader shouldn't have the memprof mask set. We have a separate raw binary format which is independent. So this should always return false. Also maybe add a comment to document the fact? snehasish: The RawInstrProfReader shouldn't have the memprof mask set. We have a separate raw binary…
		tejohnsonAuthorUnsubmitted Done Reply Inline Actions Also added an assert. tejohnson: Also added an assert.
		// Memory profiles have a separate raw format, so this should never be set.
		assert(!(Version & VARIANT_MASK_MEMPROF));
		return false;
		}

/// Returns a BitsetEnum describing the attributes of the raw instr profile.		/// Returns a BitsetEnum describing the attributes of the raw instr profile.
InstrProfKind getProfileKind() const override;		InstrProfKind getProfileKind() const override;

InstrProfSymtab &getSymtab() override {		InstrProfSymtab &getSymtab() override {
assert(Symtab.get());		assert(Symtab.get());
return *Symtab.get();		return *Symtab.get();
}		}

▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	struct InstrProfReaderIndexBase {
virtual bool atEnd() const = 0;		virtual bool atEnd() const = 0;
virtual void setValueProfDataEndianness(support::endianness Endianness) = 0;		virtual void setValueProfDataEndianness(support::endianness Endianness) = 0;
virtual uint64_t getVersion() const = 0;		virtual uint64_t getVersion() const = 0;
virtual bool isIRLevelProfile() const = 0;		virtual bool isIRLevelProfile() const = 0;
virtual bool hasCSIRLevelProfile() const = 0;		virtual bool hasCSIRLevelProfile() const = 0;
virtual bool instrEntryBBEnabled() const = 0;		virtual bool instrEntryBBEnabled() const = 0;
virtual bool hasSingleByteCoverage() const = 0;		virtual bool hasSingleByteCoverage() const = 0;
virtual bool functionEntryOnly() const = 0;		virtual bool functionEntryOnly() const = 0;
		virtual bool hasMemoryProfile() const = 0;
virtual InstrProfKind getProfileKind() const = 0;		virtual InstrProfKind getProfileKind() const = 0;
virtual Error populateSymtab(InstrProfSymtab &) = 0;		virtual Error populateSymtab(InstrProfSymtab &) = 0;
};		};

using OnDiskHashTableImplV3 =		using OnDiskHashTableImplV3 =
OnDiskIterableChainedHashTable<InstrProfLookupTrait>;		OnDiskIterableChainedHashTable<InstrProfLookupTrait>;

using MemProfRecordHashTable =		using MemProfRecordHashTable =
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	public:
bool hasSingleByteCoverage() const override {		bool hasSingleByteCoverage() const override {
return (FormatVersion & VARIANT_MASK_BYTE_COVERAGE) != 0;		return (FormatVersion & VARIANT_MASK_BYTE_COVERAGE) != 0;
}		}

bool functionEntryOnly() const override {		bool functionEntryOnly() const override {
return (FormatVersion & VARIANT_MASK_FUNCTION_ENTRY_ONLY) != 0;		return (FormatVersion & VARIANT_MASK_FUNCTION_ENTRY_ONLY) != 0;
}		}

		bool hasMemoryProfile() const override {
		return (FormatVersion & VARIANT_MASK_MEMPROF) != 0;
		}

InstrProfKind getProfileKind() const override;		InstrProfKind getProfileKind() const override;

Error populateSymtab(InstrProfSymtab &Symtab) override {		Error populateSymtab(InstrProfSymtab &Symtab) override {
return Symtab.create(HashTable->keys());		return Symtab.create(HashTable->keys());
}		}
};		};

/// Name matcher supporting fuzzy matching of symbol names to names in profiles.		/// Name matcher supporting fuzzy matching of symbol names to names in profiles.
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	public:
}		}

bool hasSingleByteCoverage() const override {		bool hasSingleByteCoverage() const override {
return Index->hasSingleByteCoverage();		return Index->hasSingleByteCoverage();
}		}

bool functionEntryOnly() const override { return Index->functionEntryOnly(); }		bool functionEntryOnly() const override { return Index->functionEntryOnly(); }

		bool hasMemoryProfile() const override { return Index->hasMemoryProfile(); }

/// Returns a BitsetEnum describing the attributes of the indexed instr		/// Returns a BitsetEnum describing the attributes of the indexed instr
/// profile.		/// profile.
InstrProfKind getProfileKind() const override {		InstrProfKind getProfileKind() const override {
return Index->getProfileKind();		return Index->getProfileKind();
}		}

/// Return true if the given buffer is in an indexed instrprof format.		/// Return true if the given buffer is in an indexed instrprof format.
static bool hasFormat(const MemoryBuffer &DataBuffer);		static bool hasFormat(const MemoryBuffer &DataBuffer);
▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

llvm/lib/Analysis/MemoryBuiltins.cpp

	Show First 20 Lines • Show All 298 Lines • ▼ Show 20 Lines
	bool llvm::isAllocationFn(			bool llvm::isAllocationFn(
	const Value *V,			const Value *V,
	function_ref<const TargetLibraryInfo &(Function &)> GetTLI) {			function_ref<const TargetLibraryInfo &(Function &)> GetTLI) {
	return getAllocationData(V, AnyAlloc, GetTLI).has_value() \|\|			return getAllocationData(V, AnyAlloc, GetTLI).has_value() \|\|
	checkFnAllocKind(V, AllocFnKind::Alloc \| AllocFnKind::Realloc);			checkFnAllocKind(V, AllocFnKind::Alloc \| AllocFnKind::Realloc);
	}			}

	/// Tests if a value is a call or invoke to a library function that			/// Tests if a value is a call or invoke to a library function that
				/// allocates memory via new.
				bool llvm::isNewLikeFn(const Value V, const TargetLibraryInfo TLI) {
				return getAllocationData(V, OpNewLike, TLI).hasValue();
				}

				/// Tests if a value is a call or invoke to a library function that
	/// allocates uninitialized memory (such as malloc).			/// allocates uninitialized memory (such as malloc).
	static bool isMallocLikeFn(const Value V, const TargetLibraryInfo TLI) {			static bool isMallocLikeFn(const Value V, const TargetLibraryInfo TLI) {
	return getAllocationData(V, MallocOrOpNewLike, TLI).has_value();			return getAllocationData(V, MallocOrOpNewLike, TLI).has_value();
	}			}

	/// Tests if a value is a call or invoke to a library function that			/// Tests if a value is a call or invoke to a library function that
	/// allocates uninitialized memory with alignment (such as aligned_alloc).			/// allocates uninitialized memory with alignment (such as aligned_alloc).
	static bool isAlignedAllocLikeFn(const Value V, const TargetLibraryInfo TLI) {			static bool isAlignedAllocLikeFn(const Value V, const TargetLibraryInfo TLI) {
	return getAllocationData(V, AlignedAllocLike, TLI).has_value();			return getAllocationData(V, AlignedAllocLike, TLI).has_value();
	}			}
				Enna1Unsubmitted Done Reply Inline Actions nit: place the definition of `llvm::isNewLikeFn` just after `llvm::isAllocationFn(const Value V, function_ref<const TargetLibraryInfo &(Function &)> GetTLI)` ? Enna1:* nit: place the definition of `llvm::isNewLikeFn` just after `llvm::isAllocationFn(const Value…

	/// Tests if a value is a call or invoke to a library function that			/// Tests if a value is a call or invoke to a library function that
	/// allocates zero-filled memory (such as calloc).			/// allocates zero-filled memory (such as calloc).
	static bool isCallocLikeFn(const Value V, const TargetLibraryInfo TLI) {			static bool isCallocLikeFn(const Value V, const TargetLibraryInfo TLI) {
	return getAllocationData(V, CallocLike, TLI).has_value();			return getAllocationData(V, CallocLike, TLI).has_value();
	}			}

	/// Tests if a value is a call or invoke to a library function that			/// Tests if a value is a call or invoke to a library function that
	▲ Show 20 Lines • Show All 938 Lines • Show Last 20 Lines

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/ADT/iterator.h"		#include "llvm/ADT/iterator.h"
#include "llvm/ADT/iterator_range.h"		#include "llvm/ADT/iterator_range.h"
#include "llvm/Analysis/BlockFrequencyInfo.h"		#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/Analysis/BranchProbabilityInfo.h"		#include "llvm/Analysis/BranchProbabilityInfo.h"
#include "llvm/Analysis/CFG.h"		#include "llvm/Analysis/CFG.h"
#include "llvm/Analysis/EHPersonalities.h"		#include "llvm/Analysis/EHPersonalities.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
		#include "llvm/Analysis/MemoryBuiltins.h"
		#include "llvm/Analysis/MemoryProfileInfo.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/ProfileSummaryInfo.h"		#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Comdat.h"		#include "llvm/IR/Comdat.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
Show All 33 Lines
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Instrumentation.h"		#include "llvm/Transforms/Instrumentation.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/MisExpect.h"		#include "llvm/Transforms/Utils/MisExpect.h"
#include "llvm/Transforms/Utils/ModuleUtils.h"		#include "llvm/Transforms/Utils/ModuleUtils.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
		#include <map>
#include <memory>		#include <memory>
#include <numeric>		#include <numeric>
		#include <set>
#include <string>		#include <string>
#include <unordered_map>		#include <unordered_map>
#include <utility>		#include <utility>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;
		using namespace llvm::memprof;
using ProfileCount = Function::ProfileCount;		using ProfileCount = Function::ProfileCount;
using VPCandidateInfo = ValueProfileCollector::CandidateInfo;		using VPCandidateInfo = ValueProfileCollector::CandidateInfo;

#define DEBUG_TYPE "pgo-instrumentation"		#define DEBUG_TYPE "pgo-instrumentation"

STATISTIC(NumOfPGOInstrument, "Number of edges instrumented.");		STATISTIC(NumOfPGOInstrument, "Number of edges instrumented.");
STATISTIC(NumOfPGOSelectInsts, "Number of select instruction instrumented.");		STATISTIC(NumOfPGOSelectInsts, "Number of select instruction instrumented.");
STATISTIC(NumOfPGOMemIntrinsics, "Number of mem intrinsics instrumented.");		STATISTIC(NumOfPGOMemIntrinsics, "Number of mem intrinsics instrumented.");
STATISTIC(NumOfPGOEdge, "Number of edges.");		STATISTIC(NumOfPGOEdge, "Number of edges.");
STATISTIC(NumOfPGOBB, "Number of basic-blocks.");		STATISTIC(NumOfPGOBB, "Number of basic-blocks.");
STATISTIC(NumOfPGOSplit, "Number of critical edge splits.");		STATISTIC(NumOfPGOSplit, "Number of critical edge splits.");
STATISTIC(NumOfPGOFunc, "Number of functions having valid profile counts.");		STATISTIC(NumOfPGOFunc, "Number of functions having valid profile counts.");
STATISTIC(NumOfPGOMismatch, "Number of functions having mismatch profile.");		STATISTIC(NumOfPGOMismatch, "Number of functions having mismatch profile.");
STATISTIC(NumOfPGOMissing, "Number of functions without profile.");		STATISTIC(NumOfPGOMissing, "Number of functions without profile.");
		STATISTIC(NumOfMemProfMissing, "Number of functions without memory profile.");
STATISTIC(NumOfPGOICall, "Number of indirect call value instrumentations.");		STATISTIC(NumOfPGOICall, "Number of indirect call value instrumentations.");
STATISTIC(NumOfCSPGOInstrument, "Number of edges instrumented in CSPGO.");		STATISTIC(NumOfCSPGOInstrument, "Number of edges instrumented in CSPGO.");
STATISTIC(NumOfCSPGOSelectInsts,		STATISTIC(NumOfCSPGOSelectInsts,
"Number of select instruction instrumented in CSPGO.");		"Number of select instruction instrumented in CSPGO.");
STATISTIC(NumOfCSPGOMemIntrinsics,		STATISTIC(NumOfCSPGOMemIntrinsics,
"Number of mem intrinsics instrumented in CSPGO.");		"Number of mem intrinsics instrumented in CSPGO.");
STATISTIC(NumOfCSPGOEdge, "Number of edges in CSPGO.");		STATISTIC(NumOfCSPGOEdge, "Number of edges in CSPGO.");
STATISTIC(NumOfCSPGOBB, "Number of basic-blocks in CSPGO.");		STATISTIC(NumOfCSPGOBB, "Number of basic-blocks in CSPGO.");
▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	static cl::opt<std::string> PGOTraceFuncHash(
"pgo-trace-func-hash", cl::init("-"), cl::Hidden,		"pgo-trace-func-hash", cl::init("-"), cl::Hidden,
cl::value_desc("function name"),		cl::value_desc("function name"),
cl::desc("Trace the hash of the function with this name."));		cl::desc("Trace the hash of the function with this name."));

static cl::opt<unsigned> PGOFunctionSizeThreshold(		static cl::opt<unsigned> PGOFunctionSizeThreshold(
"pgo-function-size-threshold", cl::Hidden,		"pgo-function-size-threshold", cl::Hidden,
cl::desc("Do not instrument functions smaller than this threshold"));		cl::desc("Do not instrument functions smaller than this threshold"));

		static cl::opt<bool> MatchMemProf(
		"pgo-match-memprof", cl::init(true), cl::Hidden,
		cl::desc("Perform matching and annotation of memprof profiles."));

namespace llvm {		namespace llvm {
// Command line option to turn on CFG dot dump after profile annotation.		// Command line option to turn on CFG dot dump after profile annotation.
// Defined in Analysis/BlockFrequencyInfo.cpp: -pgo-view-counts		// Defined in Analysis/BlockFrequencyInfo.cpp: -pgo-view-counts
extern cl::opt<PGOViewCountsType> PGOViewCounts;		extern cl::opt<PGOViewCountsType> PGOViewCounts;

// Command line option to specify the name of the function for CFG dump		// Command line option to specify the name of the function for CFG dump
// Defined in Analysis/BlockFrequencyInfo.cpp: -view-bfi-func-name=		// Defined in Analysis/BlockFrequencyInfo.cpp: -view-bfi-func-name=
extern cl::opt<std::string> ViewBlockFreqFuncName;		extern cl::opt<std::string> ViewBlockFreqFuncName;
▲ Show 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	private:
std::unordered_multimap<Comdat , GlobalValue > &ComdatMembers;		std::unordered_multimap<Comdat , GlobalValue > &ComdatMembers;

ValueProfileCollector VPC;		ValueProfileCollector VPC;

void computeCFGHash();		void computeCFGHash();
void renameComdatFunction();		void renameComdatFunction();

public:		public:
		const TargetLibraryInfo &TLI;
std::vector<std::vector<VPCandidateInfo>> ValueSites;		std::vector<std::vector<VPCandidateInfo>> ValueSites;
SelectInstVisitor SIVisitor;		SelectInstVisitor SIVisitor;
std::string FuncName;		std::string FuncName;
GlobalVariable *FuncNameVar;		GlobalVariable *FuncNameVar;

// CFG hash value for this function.		// CFG hash value for this function.
uint64_t FunctionHash = 0;		uint64_t FunctionHash = 0;

Show All 22 Lines	public:

FuncPGOInstrumentation(		FuncPGOInstrumentation(
Function &Func, TargetLibraryInfo &TLI,		Function &Func, TargetLibraryInfo &TLI,
std::unordered_multimap<Comdat , GlobalValue > &ComdatMembers,		std::unordered_multimap<Comdat , GlobalValue > &ComdatMembers,
bool CreateGlobalVar = false, BranchProbabilityInfo *BPI = nullptr,		bool CreateGlobalVar = false, BranchProbabilityInfo *BPI = nullptr,
BlockFrequencyInfo *BFI = nullptr, bool IsCS = false,		BlockFrequencyInfo *BFI = nullptr, bool IsCS = false,
bool InstrumentFuncEntry = true)		bool InstrumentFuncEntry = true)
: F(Func), IsCS(IsCS), ComdatMembers(ComdatMembers), VPC(Func, TLI),		: F(Func), IsCS(IsCS), ComdatMembers(ComdatMembers), VPC(Func, TLI),
ValueSites(IPVK_Last + 1), SIVisitor(Func),		TLI(TLI), ValueSites(IPVK_Last + 1), SIVisitor(Func),
MST(F, InstrumentFuncEntry, BPI, BFI) {		MST(F, InstrumentFuncEntry, BPI, BFI) {
// This should be done before CFG hash computation.		// This should be done before CFG hash computation.
SIVisitor.countSelects(Func);		SIVisitor.countSelects(Func);
ValueSites[IPVK_MemOPSize] = VPC.get(IPVK_MemOPSize);		ValueSites[IPVK_MemOPSize] = VPC.get(IPVK_MemOPSize);
if (!IsCS) {		if (!IsCS) {
NumOfPGOSelectInsts += SIVisitor.getNumOfSelectInsts();		NumOfPGOSelectInsts += SIVisitor.getNumOfSelectInsts();
NumOfPGOMemIntrinsics += ValueSites[IPVK_MemOPSize].size();		NumOfPGOMemIntrinsics += ValueSites[IPVK_MemOPSize].size();
NumOfPGOBB += MST.BBInfos.size();		NumOfPGOBB += MST.BBInfos.size();
▲ Show 20 Lines • Show All 462 Lines • ▼ Show 20 Lines	PGOUseFunc(Function &Func, Module *Modu, TargetLibraryInfo &TLI,
FuncInfo(Func, TLI, ComdatMembers, false, BPI, BFIin, IsCS,		FuncInfo(Func, TLI, ComdatMembers, false, BPI, BFIin, IsCS,
InstrumentFuncEntry),		InstrumentFuncEntry),
FreqAttr(FFA_Normal), IsCS(IsCS) {}		FreqAttr(FFA_Normal), IsCS(IsCS) {}

// Read counts for the instrumented BB from profile.		// Read counts for the instrumented BB from profile.
bool readCounters(IndexedInstrProfReader *PGOReader, bool &AllZeros,		bool readCounters(IndexedInstrProfReader *PGOReader, bool &AllZeros,
InstrProfRecord::CountPseudoKind &PseudoKind);		InstrProfRecord::CountPseudoKind &PseudoKind);

		// Read memprof data for the instrumented function from profile.
		bool readMemprof(IndexedInstrProfReader *PGOReader);

// Populate the counts for all BBs.		// Populate the counts for all BBs.
void populateCounters();		void populateCounters();

// Set the branch weights based on the count values.		// Set the branch weights based on the count values.
void setBranchWeights();		void setBranchWeights();

// Annotate the value profile call sites for all value kind.		// Annotate the value profile call sites for all value kind.
void annotateValueSites();		void annotateValueSites();
▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	static void annotateFunctionWithHashMismatch(Function &F,
}		}

MDBuilder MDB(ctx);		MDBuilder MDB(ctx);
Names.push_back(MDB.createString(MetadataName));		Names.push_back(MDB.createString(MetadataName));
MDNode *MD = MDTuple::get(ctx, Names);		MDNode *MD = MDTuple::get(ctx, Names);
F.setMetadata(LLVMContext::MD_annotation, MD);		F.setMetadata(LLVMContext::MD_annotation, MD);
}		}

		static void addCallsiteMetadata(Instruction &I,
		std::vector<uint64_t> &InlinedCallStack,
		LLVMContext &Ctx) {
		I.setMetadata(LLVMContext::MD_callsite,
		buildCallstackMetadata(InlinedCallStack, Ctx));
		}

		static hash_code computeStackId(GlobalValue::GUID Function, uint32_t LineOffset,
		uint32_t Column) {
		return hash_combine(Function, LineOffset, Column);
		MaskRayUnsubmitted Done Reply Inline Actions You may use BLAKE3 instead of MD5. BLAKE3 is much faster than LLVM's slow MD5 implementation. MaskRay: You may use BLAKE3 instead of MD5. BLAKE3 is much faster than LLVM's slow MD5 implementation.
		MaskRayUnsubmitted Not Done Reply Inline Actions llvm/include/llvm/Support/xxhash.h is also a good choice. MaskRay: llvm/include/llvm/Support/xxhash.h is also a good choice.
		tejohnsonAuthorUnsubmitted Done Reply Inline Actions Thanks for the tip! I used BLAKE3. tejohnson: Thanks for the tip! I used BLAKE3.
		}

		static hash_code computeStackId(const memprof::Frame &Frame) {
		return computeStackId(Frame.Function, Frame.LineOffset, Frame.Column);
		}

		static void addCallStack(CallStackTrie &AllocTrie,
		const AllocationInfo *AllocInfo) {
		snehasishUnsubmitted Done Reply Inline Actions Prefer moving this code after the loop, close to where AllocType is used. snehasish: Prefer moving this code after the loop, close to where AllocType is used.
		SmallVector<uint64_t> StackIds;
		for (auto StackFrame : AllocInfo->CallStack)
		StackIds.push_back(computeStackId(StackFrame));
		snehasishUnsubmitted Done Reply Inline Actions I think if you use an llvm::SetVector here instead then you don't need the StackHashSet std::set below. CallstackTrie::addCallstack already accepts an ArrayRef so it won't need to change if we use a SetVector. snehasish: I think if you use an llvm::SetVector here instead then you don't need the StackHashSet std…
		tejohnsonAuthorUnsubmitted Done Reply Inline Actions Noted, but moot as this has been removed (see below) tejohnson: Noted, but moot as this has been removed (see below)
		auto AllocType = getAllocType(AllocInfo->Info.getMaxAccessCount(),
		snehasishUnsubmitted Done Reply Inline Actions nit: It doesn't look like we #include <set> in this file so we are probably relying on having it transitively being included from somewhere. snehasish: nit: It doesn't look like we #include <set> in this file so we are probably relying on having…
		tejohnsonAuthorUnsubmitted Done Reply Inline Actions Although this std::set has been removed, I use it elsewhere so added the include. tejohnson: Although this std::set has been removed, I use it elsewhere so added the include.
		AllocInfo->Info.getMinSize(),
		AllocInfo->Info.getMinLifetime());
		AllocTrie.addCallStack(AllocType, StackIds);
		tejohnsonAuthorUnsubmitted Done Reply Inline Actions I have removed this handling. It is not correct in the case of mutual recursion involving more than one function where it will result in non-sensical stack traces. I am deferring handling recursion until later during the LTO phase. tejohnson: I have removed this handling. It is not correct in the case of mutual recursion involving more…
		}

		// Helper to compare the InlinedCallStack computed from an instruction's debug
		// info to a list of Frames from profile data (either the allocation data or a
		// callsite). For callsites, the StartIndex to use in the Frame array may be
		// non-zero.
		static bool
		stackFrameIncludesInlinedCallStack(ArrayRef<Frame> ProfileCallStack,
		ArrayRef<uint64_t> InlinedCallStack,
		unsigned StartIndex = 0) {
		auto StackFrame = ProfileCallStack.begin() + StartIndex;
		auto InlCallStackIter = InlinedCallStack.begin();
		for (; StackFrame != ProfileCallStack.end() &&
		InlCallStackIter != InlinedCallStack.end();
		++StackFrame, ++InlCallStackIter) {
		uint64_t StackId = computeStackId(*StackFrame);
		if (StackId != *InlCallStackIter)
		return false;
		}
		// Return true if we found and matched all stack ids from the call
		snehasishUnsubmitted Not Done Reply Inline Actions Consider defining the lambda outside above the condition to reduce indentation. IMO it will be a little easier to follow if it wasn't inlined into the if statement itself. snehasish: Consider defining the lambda outside above the condition to reduce indentation. IMO it will be…
		tejohnsonAuthorUnsubmitted Done Reply Inline Actions I could do this, but as is it mirrors the structure of the similar handling in readCounters, which has some advantages. wdyt? tejohnson: I could do this, but as is it mirrors the structure of the similar handling in readCounters…
		snehasishUnsubmitted Done Reply Inline Actions I wasn't a big fan of the existing structure in readCounters but I didn't want to ask you to change the other code. Let's leave it as is for now. snehasish: I wasn't a big fan of the existing structure in readCounters but I didn't want to ask you to…
		// instruction.
		return InlCallStackIter == InlinedCallStack.end();
		}

		bool PGOUseFunc::readMemprof(IndexedInstrProfReader *PGOReader) {
		if (!MatchMemProf)
		return true;

		auto &Ctx = M->getContext();

		auto FuncGUID = Function::getGUID(FuncInfo.FuncName);
		Expected<memprof::MemProfRecord> MemProfResult =
		PGOReader->getMemProfRecord(FuncGUID);
		if (Error E = MemProfResult.takeError()) {
		handleAllErrors(std::move(E), [&](const InstrProfError &IPE) {
		auto Err = IPE.get();
		bool SkipWarning = false;
		LLVM_DEBUG(dbgs() << "Error in reading profile for Func "
		<< FuncInfo.FuncName << ": ");
		if (Err == instrprof_error::unknown_function) {
		NumOfMemProfMissing++;
		snehasishUnsubmitted Done Reply Inline Actions Is an llvm::Twine a better choice here instead of std::string? I guess it doesn't matter much in error handling code. snehasish: Is an llvm::Twine a better choice here instead of std::string? I guess it doesn't matter much…
		SkipWarning = !PGOWarnMissing;
		LLVM_DEBUG(dbgs() << "unknown function");
		} else if (Err == instrprof_error::hash_mismatch) {
		SkipWarning =
		NoPGOWarnMismatch \|\|
		(NoPGOWarnMismatchComdatWeak &&
		(F.hasComdat() \|\|
		F.getLinkage() == GlobalValue::AvailableExternallyLinkage));
		LLVM_DEBUG(dbgs() << "hash mismatch (skip=" << SkipWarning << ")");
		}

		if (SkipWarning)
		return;

		std::string Msg =
		(IPE.message() + Twine(" ") + F.getName().str() + Twine(" Hash = ") +
		snehasishUnsubmitted Not Done Reply Inline Actions LocHashToCallSiteFrame to indicate the value in the map corresponds to an individual frame? snehasish: LocHashToCallSiteFrame to indicate the value in the map corresponds to an individual frame?
		tejohnsonAuthorUnsubmitted Done Reply Inline Actions The key is the location hash (stack id) of a single frame, but the value is a set of all the CallSites from the profile that reference it. tejohnson: The key is the location hash (stack id) of a single frame, but the value is a set of all the…
		std::to_string(FuncInfo.FunctionHash))
		.str();

		Ctx.diagnose(
		DiagnosticInfoPGOProfile(M->getName().data(), Msg, DS_Warning));
		});
		snehasishUnsubmitted Done Reply Inline Actions Not using auto over here would be helpful to know that we are indexing into the map below using an uint64_t. Same below. snehasish: Not using auto over here would be helpful to know that we are indexing into the map below using…
		return false;
		}

		// Build maps of the location hash to all profile data with that leaf location
		// (allocation info and the callsites).
		std::map<uint64_t, std::set<const AllocationInfo *>> LocHashToAllocInfo;
		// For the callsites we need to record the index of the associated frame in
		// the frame array (see comments below where the map entries are added).
		std::map<uint64_t, std::set<std::pair<const SmallVector<Frame> *, unsigned>>>
		LocHashToCallSites;
		const auto MemProfRec = std::move(MemProfResult.get());
		snehasishUnsubmitted Done Reply Inline Actions Should we assert that it was actually found? snehasish: Should we assert that it was actually found?
		tejohnsonAuthorUnsubmitted Done Reply Inline Actions Added assert after the loop. tejohnson: Added assert after the loop.
		for (auto &AI : MemProfRec.AllocSites) {
		// Associate the allocation info with the leaf frame. The later matching
		// code will match any inlined call sequences in the IR with a longer prefix
		// of call stack frames.
		uint64_t StackId = computeStackId(AI.CallStack[0]);
		LocHashToAllocInfo[StackId].insert(&AI);
		}
		for (auto &CS : MemProfRec.CallSites) {
		// Need to record all frames from leaf up to and including this function,
		// as any of these may or may not have been inlined at this point.
		unsigned Idx = 0;
		for (auto &StackFrame : CS) {
		uint64_t StackId = computeStackId(StackFrame);
		LocHashToCallSites[StackId].insert(std::make_pair(&CS, Idx++));
		// Once we find this function, we can stop recording.
		if (StackFrame.Function == FuncGUID)
		break;
		}
		assert(Idx <= CS.size() && CS[Idx - 1].Function == FuncGUID);
		}

		auto GetOffset = [](const DILocation *DIL) {
		return (DIL->getLine() - DIL->getScope()->getSubprogram()->getLine()) &
		0xffff;
		};

		// Now walk the instructions, looking up the associated profile data using
		// dbug locations.
		for (auto &BB : F) {
		for (auto &I : BB) {
		snehasishUnsubmitted Not Done Reply Inline Actions Can you add an assert for this? snehasish: Can you add an assert for this?
		tejohnsonAuthorUnsubmitted Done Reply Inline Actions In this case "may only" meant "might only", not "may only at most". So I can't assert on anything. This can happen for example if we have a location that corresponds to both an allocation call and another callsite (I've seen this periodically, and can reproduce e.g. with a macro). We would need to use discriminators more widely to better distinguish them in that case (with the handling here we will only match to the allocation call for now - edit: a slight change noted further below ensures this is the case). Will change /may/might/ and add a note. tejohnson: In this case "may only" meant "might only", not "may only at most". So I can't assert on…
		snehasishUnsubmitted Done Reply Inline Actions Thanks for the explanation. snehasish: Thanks for the explanation.
		if (I.isDebugOrPseudoInst())
		continue;
		// We are only interested in calls (allocation or interior call stack
		// context calls).
		auto *CI = dyn_cast<CallBase>(&I);
		snehasishUnsubmitted Done Reply Inline Actions `DIL != nullptr` is a little easier to follow. snehasish: `DIL != nullptr` is a little easier to follow.
		if (!CI)
		continue;
		auto *CalledFunction = CI->getCalledFunction();
		if (CalledFunction && CalledFunction->isIntrinsic())
		continue;
		// List of call stack ids computed from the location hashes on debug
		// locations (leaf to inlined at root).
		std::vector<uint64_t> InlinedCallStack;
		// Was the leaf location found in one of the profile maps?
		bool LeafFound = false;
		// If leaf was found in a map, iterators pointing to its location in both
		// of the maps. It might exist in neither, one, or both (the latter case
		// can happen because we don't currently have discriminators to
		// distinguish the case when a single line/col maps to both an allocation
		// and another callsite).
		std::map<uint64_t, std::set<const AllocationInfo *>>::iterator
		AllocInfoIter;
		std::map<uint64_t, std::set<std::pair<const SmallVector<Frame> *,
		unsigned>>>::iterator CallSitesIter;
		for (const DILocation *DIL = I.getDebugLoc(); DIL != nullptr;
		DIL = DIL->getInlinedAt()) {
		// Use C++ linkage name if possible. Need to compile with
		// -fdebug-info-for-profiling to get linkage name.
		StringRef Name = DIL->getScope()->getSubprogram()->getLinkageName();
		if (Name.empty())
		Name = DIL->getScope()->getSubprogram()->getName();
		auto CalleeGUID = Function::getGUID(Name);
		auto StackId =
		computeStackId(CalleeGUID, GetOffset(DIL), DIL->getColumn());
		// LeafFound will only be false on the first iteration, since we either
		// set it true or break out of the loop below.
		if (!LeafFound) {
		snehasishUnsubmitted Done Reply Inline Actions Prefer moving this to a static helper method to reduce the size of the loop body, reduce indentation for this logic and make it more readable overall. Probably creating an functor object on the stack for each instruction that we process is not efficient either. snehasish: Prefer moving this to a static helper method to reduce the size of the loop body, reduce…
		AllocInfoIter = LocHashToAllocInfo.find(StackId);
		CallSitesIter = LocHashToCallSites.find(StackId);
		// Check if the leaf is in one of the maps. If not, no need to look
		// further at this call.
		if (AllocInfoIter == LocHashToAllocInfo.end() &&
		CallSitesIter == LocHashToCallSites.end())
		break;
		LeafFound = true;
		}
		InlinedCallStack.push_back(StackId);
		}
		// If leaf not in either of the maps, skip inst.
		if (!LeafFound)
		continue;

		// First add !memprof metadata from allocation info, if we found the
		// instruction's leaf location in that map, and if the rest of the
		snehasishUnsubmitted Not Done Reply Inline Actions "First add !memprof metadata ..." -- the ordering of the if-else condition isn't necessary though since only one of the iters can be non-null? We could rewrite the else condition first to reduce the complexity here a bit. Eg -- if (CallSitesIter != LocHashToCallSites.end()) { ... continue } // Flip the conditions here if (!isNewLikeFn() \|\| AllocInfoIter == LocHashToAllocInfo.end()) { continue } CallStackTrie AllocTrie; ... snehasish: "First add !memprof metadata ..." -- the ordering of the if-else condition isn't necessary…
		tejohnsonAuthorUnsubmitted Done Reply Inline Actions As noted earlier, it might be in more than one map. But I realized we could sometimes add the callsite metadata, instead of the memprof metadata, to a non-new allocation call (e.g. malloc) when there is a matching location in both maps given the structuring of the handling below. I've changed it so we handle all instructions with matching allocation profile data in the below if statement, and skip adding any metadata if there is matching allocation profile data but it is not isNewLikeFn. I've made the allocation profile matching if statement below have a continue at the end, so that I can remove the indentation further below for the callsite-only matched situation and added an assert there. tejohnson: As noted earlier, it might be in more than one map. But I realized we could sometimes add the…
		// instruction's locations match the prefix Frame locations on an
		// allocation context with the same leaf.
		if (AllocInfoIter != LocHashToAllocInfo.end()) {
		// Only consider allocations via new, to reduce unnecessary metadata,
		// since those are the only allocations that will be targeted initially.
		if (!isNewLikeFn(CI, &FuncInfo.TLI))
		continue;
		// We may match this instruction's location list to multiple MIB
		// contexts. Add them to a Trie specialized for trimming the contexts to
		// the minimal needed to disambiguate contexts with unique behavior.
		CallStackTrie AllocTrie;
		for (auto *AllocInfo : AllocInfoIter->second) {
		// Check the full inlined call stack against this one.
		// If we found and thus matched all frames on the call, include
		// this MIB.
		if (stackFrameIncludesInlinedCallStack(AllocInfo->CallStack,
		InlinedCallStack))
		addCallStack(AllocTrie, AllocInfo);
		}
		// We might not have matched any to the full inlined call stack.
		// But if we did, create and attach metadata, or a function attribute if
		// all contexts have identical profiled behavior.
		if (!AllocTrie.empty()) {
		// MemprofMDAttached will be false if a function attribute was
		// attached.
		bool MemprofMDAttached = AllocTrie.buildAndAttachMIBMetadata(CI);
		assert(MemprofMDAttached == I.hasMetadata(LLVMContext::MD_memprof));
		if (MemprofMDAttached) {
		// Add callsite metadata for the instruction's location list so that
		// it simpler later on to identify which part of the MIB contexts
		// are from this particular instruction (including during inlining,
		// when the callsite metdata will be updated appropriately).
		// FIXME: can this be changed to strip out the matching stack
		// context ids from the MIB contexts and not add any callsite
		// metadata here to save space?
		addCallsiteMetadata(I, InlinedCallStack, Ctx);
		}
		}
		continue;
		}

		// Otherwise, add callsite metadata. If we reach here then we found the
		// instruction's leaf location in the callsites map and not the allocation
		// map.
		assert(CallSitesIter != LocHashToCallSites.end());
		for (auto CallStackIdx : CallSitesIter->second) {
		// If we found and thus matched all frames on the call, create and
		// attach call stack metadata.
		if (stackFrameIncludesInlinedCallStack(
		*CallStackIdx.first, InlinedCallStack, CallStackIdx.second)) {
		addCallsiteMetadata(I, InlinedCallStack, Ctx);
		// Only need to find one with a matching call stack and add a single
		// callsite metadata.
		break;
		}
		}
		}
		}

		return true;
		}

// Read the profile from ProfileFileName and assign the value to the		// Read the profile from ProfileFileName and assign the value to the
// instrumented BB and the edges. This function also updates ProgramMaxCount.		// instrumented BB and the edges. This function also updates ProgramMaxCount.
// Return true if the profile are successfully read, and false on errors.		// Return true if the profile are successfully read, and false on errors.
bool PGOUseFunc::readCounters(IndexedInstrProfReader *PGOReader, bool &AllZeros,		bool PGOUseFunc::readCounters(IndexedInstrProfReader *PGOReader, bool &AllZeros,
InstrProfRecord::CountPseudoKind &PseudoKind) {		InstrProfRecord::CountPseudoKind &PseudoKind) {
auto &Ctx = M->getContext();		auto &Ctx = M->getContext();
uint64_t MismatchedFuncSum = 0;		uint64_t MismatchedFuncSum = 0;
Expected<InstrProfRecord> Result = PGOReader->getInstrProfRecord(		Expected<InstrProfRecord> Result = PGOReader->getInstrProfRecord(
▲ Show 20 Lines • Show All 537 Lines • ▼ Show 20 Lines	if (!PGOReader) {
Ctx.diagnose(DiagnosticInfoPGOProfile(ProfileFileName.data(),		Ctx.diagnose(DiagnosticInfoPGOProfile(ProfileFileName.data(),
StringRef("Cannot get PGOReader")));		StringRef("Cannot get PGOReader")));
return false;		return false;
}		}
if (!PGOReader->hasCSIRLevelProfile() && IsCS)		if (!PGOReader->hasCSIRLevelProfile() && IsCS)
return false;		return false;

// TODO: might need to change the warning once the clang option is finalized.		// TODO: might need to change the warning once the clang option is finalized.
if (!PGOReader->isIRLevelProfile()) {		if (!PGOReader->isIRLevelProfile() && !PGOReader->hasMemoryProfile()) {
Ctx.diagnose(DiagnosticInfoPGOProfile(		Ctx.diagnose(DiagnosticInfoPGOProfile(
ProfileFileName.data(), "Not an IR level instrumentation profile"));		ProfileFileName.data(), "Not an IR level instrumentation profile"));
return false;		return false;
}		}
if (PGOReader->hasSingleByteCoverage()) {		if (PGOReader->hasSingleByteCoverage()) {
Ctx.diagnose(DiagnosticInfoPGOProfile(		Ctx.diagnose(DiagnosticInfoPGOProfile(
ProfileFileName.data(),		ProfileFileName.data(),
"Cannot use coverage profiles for optimization"));		"Cannot use coverage profiles for optimization"));
Show All 30 Lines	for (auto &F : M) {
auto &TLI = LookupTLI(F);		auto &TLI = LookupTLI(F);
auto *BPI = LookupBPI(F);		auto *BPI = LookupBPI(F);
auto *BFI = LookupBFI(F);		auto *BFI = LookupBFI(F);
// Split indirectbr critical edges here before computing the MST rather than		// Split indirectbr critical edges here before computing the MST rather than
// later in getInstrBB() to avoid invalidating it.		// later in getInstrBB() to avoid invalidating it.
SplitIndirectBrCriticalEdges(F, /IgnoreBlocksWithoutPHI=/false, BPI, BFI);		SplitIndirectBrCriticalEdges(F, /IgnoreBlocksWithoutPHI=/false, BPI, BFI);
PGOUseFunc Func(F, &M, TLI, ComdatMembers, BPI, BFI, PSI, IsCS,		PGOUseFunc Func(F, &M, TLI, ComdatMembers, BPI, BFI, PSI, IsCS,
InstrumentFuncEntry);		InstrumentFuncEntry);
		// Read and match memprof first since we do this via debug info and can
		// match even if there is an IR mismatch detected for regular PGO below.
		if (PGOReader->hasMemoryProfile())
		Func.readMemprof(PGOReader.get());

		if (!PGOReader->isIRLevelProfile())
		continue;

// When PseudoKind is set to a vaule other than InstrProfRecord::NotPseudo,		// When PseudoKind is set to a vaule other than InstrProfRecord::NotPseudo,
// it means the profile for the function is unrepresentative and this		// it means the profile for the function is unrepresentative and this
// function is actually hot / warm. We will reset the function hot / cold		// function is actually hot / warm. We will reset the function hot / cold
// attribute and drop all the profile counters.		// attribute and drop all the profile counters.
InstrProfRecord::CountPseudoKind PseudoKind = InstrProfRecord::NotPseudo;		InstrProfRecord::CountPseudoKind PseudoKind = InstrProfRecord::NotPseudo;

		// When AllMinusOnes is true, it means the profile for the function
		// is unrepresentative and this function is actually hot. Set the
		// entry count of the function to be multiple times of hot threshold
		// and drop all its internal counters.
		bool AllMinusOnes = false;
bool AllZeros = false;		bool AllZeros = false;
if (!Func.readCounters(PGOReader.get(), AllZeros, PseudoKind))		if (!Func.readCounters(PGOReader.get(), AllZeros, PseudoKind))
continue;		continue;
if (AllZeros) {		if (AllZeros) {
F.setEntryCount(ProfileCount(0, Function::PCT_Real));		F.setEntryCount(ProfileCount(0, Function::PCT_Real));
if (Func.getProgramMaxCount() != 0)		if (Func.getProgramMaxCount() != 0)
ColdFunctions.push_back(&F);		ColdFunctions.push_back(&F);
continue;		continue;
▲ Show 20 Lines • Show All 256 Lines • Show Last 20 Lines

llvm/test/Transforms/PGOProfile/Inputs/memprof.exe

This binary file was added.

Property	Old Value	New Value
File Mode	null	100755

llvm/test/Transforms/PGOProfile/Inputs/memprof.memprofraw

This binary file was added.

llvm/test/Transforms/PGOProfile/Inputs/memprof_pgo.profraw

This binary file was added.

llvm/test/Transforms/PGOProfile/memprof.ll

This file was added.

				;; Tests memprof profile matching (with and without instrumentation profiles).

				;; TODO: Use text profile inputs once that is available for memprof.

				;; The input IR and raw profiles have been generated from the following source:
				;;
				;; #include <stdlib.h>
				;; #include <string.h>
				;; #include <unistd.h>
				;; char *foo() {
				;; return new char[10];
				;; }
				;; char *foo2() {
				;; return foo();
				;; }
				;; char *bar() {
				;; return foo2();
				;; }
				;; char *baz() {
				;; return foo2();
				;; }
				;; char *recurse(unsigned n) {
				;; if (!n)
				;; return foo();
				;; return recurse(n-1);
				;; }
				;; int main(int argc, char **argv) {
				;; // Test allocations with different combinations of stack contexts and
				;; // coldness (based on lifetime, since they are all accessed a single time
				;; // per byte via the memset).
				;; char *a = new char[10];
				;; char *b = new char[10];
				;; char *c = foo();
				;; char *d = foo();
				;; char *e = bar();
				;; char *f = baz();
				;; memset(a, 0, 10);
				;; memset(b, 0, 10);
				;; memset(c, 0, 10);
				;; memset(d, 0, 10);
				;; memset(e, 0, 10);
				;; memset(f, 0, 10);
				;; // a and c have short lifetimes
				;; delete[] a;
				;; delete[] c;
				;; // b, d, e, and f have long lifetimes and will be detected as cold by default.
				;; sleep(200);
				;; delete[] b;
				;; delete[] d;
				;; delete[] e;
				;; delete[] f;
				;; // Loop ensures the two calls to recurse have stack contexts that only differ
				;; // in one level of recursion. We should get two stack contexts reflecting the
				;; // different levels of recursion and different allocation behavior (since the
				;; // first has a very long lifetime and the second has a short lifetime).
				;; for (unsigned i = 0; i < 2; i++) {
				;; char *g = recurse(i + 3);
				;; memset(g, 0, 10);
				;; if (!i)
				;; sleep(200);
				;; delete[] g;
				;; }
				;; return 0;
				;; }
				;;
				;; The following commands were used to compile the source to instrumented
				;; executables and collect raw binary format profiles:
				;;
				;; # Collect memory profile:
				;; $ clang++ -fuse-ld=lld -no-pie -Wl,--no-rosegment -gmlt \
				;; -fdebug-info-for-profiling -mno-omit-leaf-frame-pointer \
				;; -fno-omit-frame-pointer -fno-optimize-sibling-calls -m64 -Wl,-build-id \
				;; memprof.cc -o memprof.exe -fmemory-profile
				;; $ env MEMPROF_OPTIONS=log_path=stdout ./memprof.exe > memprof.memprofraw
				;;
				;; # Collect IR PGO profile:
				;; $ clang++ -fuse-ld=lld -no-pie -Wl,--no-rosegment -gmlt \
				;; -fdebug-info-for-profiling -mno-omit-leaf-frame-pointer \
				;; -fno-omit-frame-pointer -fno-optimize-sibling-calls -m64 -Wl,-build-id \
				;; memprof.cc -o pgo.exe -fprofile-generate=.
				;; $ ./pgo.exe
				;; $ mv default_*.profraw memprof_pgo.profraw
				;;
				snehasishUnsubmitted Done Reply Inline Actions ./pgo.exe snehasish: ./pgo.exe
				;; # Generate below LLVM IR for use in matching:
				;; $ clang++ -gmlt -fdebug-info-for-profiling -fno-omit-frame-pointer \
				;; -fno-optimize-sibling-calls memprof.cc -S -emit-llvm

				;; Generate indexed profiles of all combinations:
				; RUN: llvm-profdata merge %S/Inputs/memprof.memprofraw --profiled-binary %S/Inputs/memprof.exe -o %t.memprofdata
				; RUN: llvm-profdata merge %S/Inputs/memprof_pgo.profraw %S/Inputs/memprof.memprofraw --profiled-binary %S/Inputs/memprof.exe -o %t.pgomemprofdata
				; RUN: llvm-profdata merge %S/Inputs/memprof_pgo.profraw -o %t.pgoprofdata

				;; In all below cases we should not get any messages about missing profile data
				;; for any functions. Either we are not performing any matching for a particular
				;; profile type or we are performing the matching and it should be successful.
				; ALL-NOT: memprof record not found for function hash
				snehasishUnsubmitted Done Reply Inline Actions --check-prefixes=MEMPROF,ALL can be used instead. snehasish: --check-prefixes=MEMPROF,ALL can be used instead.
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions Done here and elsewhere. tejohnson: Done here and elsewhere.
				; ALL-NOT: no profile data available for function

				;; Feed back memprof-only profile
				; RUN: opt < %s -passes=pgo-instr-use -pgo-test-profile-file=%t.memprofdata -pgo-warn-missing-function -S 2>&1 \| FileCheck %s --check-prefixes=MEMPROF,ALL,MEMPROFONLY
				; There should not be any PGO metadata
				; MEMPROFONLY-NOT: !prof

				;; Feed back pgo-only profile
				; RUN: opt < %s -passes=pgo-instr-use -pgo-test-profile-file=%t.pgoprofdata -pgo-warn-missing-function -S 2>&1 \| FileCheck %s --check-prefixes=PGO,ALL,PGOONLY
				; There should not be any memprof related metadata
				; PGOONLY-NOT: !memprof
				; PGOONLY-NOT: !callsite

				snehasishUnsubmitted Done Reply Inline Actions I suspect that the check lines are redundant. I think FileCheck scans the entire file and groups conditions by prefix. So we could have the 3 run lines followed by a group of prefix checks. ; ALL-NOT: memprof record not found for function hash ; ALL-NOT: no profile data available for function ; MEMPROF-NOT: !prof ; PGOONLY-NOT: !memprof ; PGOONLY-NOT: !callsite snehasish: I suspect that the check lines are redundant. I think FileCheck scans the entire file and…
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions It doesn't group them by prefix, but you are right there are a lot of redundant checks in this test. And one that is not correct (see below). I have cleaned this up tejohnson: It doesn't group them by prefix, but you are right there are a lot of redundant checks in this…
				;; Feed back pgo+memprof-only profile
				; RUN: opt < %s -passes=pgo-instr-use -pgo-test-profile-file=%t.pgomemprofdata -pgo-warn-missing-function -S 2>&1 \| FileCheck %s --check-prefixes=MEMPROF,PGO,ALL

				; ModuleID = 'memprof.cc'
				source_filename = "memprof.cc"
				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions The comment here and below and one of the checks is incorrect. This case is testing a pgo+memprof profile. tejohnson: The comment here and below and one of the checks is incorrect. This case is testing a…

				; Function Attrs: mustprogress noinline optnone uwtable
				; ALL-LABEL: define dso_local noundef ptr @_Z3foov()
				; There should be some PGO metadata
				; PGO: !prof
				define dso_local noundef ptr @_Z3foov() #0 !dbg !10 {
				entry:
				; MEMPROF: call {{.}} @_Znam{{.}} !memprof ![[M1:[0-9]+]], !callsite ![[C1:[0-9]+]]
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions This is not correct for this test. But worked because it matched the first !prof against the first PGO label below after ALL-LABEL. This was just uselessly checking that there were no additional !prof before that. I have removed this and made the earlier similar check a MEMPROFONLY check. tejohnson: This is not correct for this test. But worked because it matched the first !prof against the…
				%call = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #6, !dbg !13
				ret ptr %call, !dbg !14
				}

				; Function Attrs: nobuiltin allocsize(0)
				declare noundef nonnull ptr @_Znam(i64 noundef) #1

				; Function Attrs: mustprogress noinline optnone uwtable
				; ALL-LABEL: define dso_local noundef ptr @_Z4foo2v()
				define dso_local noundef ptr @_Z4foo2v() #0 !dbg !15 {
				entry:
				; MEMPROF: call {{.}} @_Z3foov{{.}} !callsite ![[C2:[0-9]+]]
				%call = call noundef ptr @_Z3foov(), !dbg !16
				ret ptr %call, !dbg !17
				}

				; Function Attrs: mustprogress noinline optnone uwtable
				define dso_local noundef ptr @_Z3barv() #0 !dbg !18 {
				entry:
				; MEMPROF: call {{.}} @_Z4foo2v{{.}} !callsite ![[C3:[0-9]+]]
				%call = call noundef ptr @_Z4foo2v(), !dbg !19
				ret ptr %call, !dbg !20
				}

				; Function Attrs: mustprogress noinline optnone uwtable
				define dso_local noundef ptr @_Z3bazv() #0 !dbg !21 {
				entry:
				; MEMPROF: call {{.}} @_Z4foo2v{{.}} !callsite ![[C4:[0-9]+]]
				%call = call noundef ptr @_Z4foo2v(), !dbg !22
				ret ptr %call, !dbg !23
				}

				; Function Attrs: mustprogress noinline optnone uwtable
				define dso_local noundef ptr @_Z7recursej(i32 noundef %n) #0 !dbg !24 {
				entry:
				%retval = alloca ptr, align 8
				%n.addr = alloca i32, align 4
				store i32 %n, ptr %n.addr, align 4
				%0 = load i32, ptr %n.addr, align 4, !dbg !25
				%tobool = icmp ne i32 %0, 0, !dbg !25
				br i1 %tobool, label %if.end, label %if.then, !dbg !26

				if.then: ; preds = %entry
				; MEMPROF: call {{.}} @_Z3foov{{.}} !callsite ![[C5:[0-9]+]]
				%call = call noundef ptr @_Z3foov(), !dbg !27
				store ptr %call, ptr %retval, align 8, !dbg !28
				br label %return, !dbg !28

				if.end: ; preds = %entry
				%1 = load i32, ptr %n.addr, align 4, !dbg !29
				%sub = sub i32 %1, 1, !dbg !30
				; MEMPROF: call {{.}} @_Z7recursej{{.}} !callsite ![[C6:[0-9]+]]
				%call1 = call noundef ptr @_Z7recursej(i32 noundef %sub), !dbg !31
				store ptr %call1, ptr %retval, align 8, !dbg !32
				br label %return, !dbg !32

				return: ; preds = %if.end, %if.then
				%2 = load ptr, ptr %retval, align 8, !dbg !33
				ret ptr %2, !dbg !33
				}

				; Function Attrs: mustprogress noinline norecurse optnone uwtable
				define dso_local noundef i32 @main(i32 noundef %argc, ptr noundef %argv) #2 !dbg !34 {
				entry:
				%retval = alloca i32, align 4
				%argc.addr = alloca i32, align 4
				%argv.addr = alloca ptr, align 8
				%a = alloca ptr, align 8
				%b = alloca ptr, align 8
				%c = alloca ptr, align 8
				%d = alloca ptr, align 8
				%e = alloca ptr, align 8
				%f = alloca ptr, align 8
				%i = alloca i32, align 4
				%g = alloca ptr, align 8
				store i32 0, ptr %retval, align 4
				store i32 %argc, ptr %argc.addr, align 4
				store ptr %argv, ptr %argv.addr, align 8
				; MEMPROF: call {{.}} @_Znam{{.}} #[[A1:[0-9]+]]
				%call = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #6, !dbg !35
				store ptr %call, ptr %a, align 8, !dbg !36
				; MEMPROF: call {{.}} @_Znam{{.}} #[[A2:[0-9]+]]
				%call1 = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #6, !dbg !37
				store ptr %call1, ptr %b, align 8, !dbg !38
				; MEMPROF: call {{.}} @_Z3foov{{.}} !callsite ![[C7:[0-9]+]]
				%call2 = call noundef ptr @_Z3foov(), !dbg !39
				store ptr %call2, ptr %c, align 8, !dbg !40
				; MEMPROF: call {{.}} @_Z3foov{{.}} !callsite ![[C8:[0-9]+]]
				%call3 = call noundef ptr @_Z3foov(), !dbg !41
				store ptr %call3, ptr %d, align 8, !dbg !42
				; MEMPROF: call {{.}} @_Z3barv{{.}} !callsite ![[C9:[0-9]+]]
				%call4 = call noundef ptr @_Z3barv(), !dbg !43
				store ptr %call4, ptr %e, align 8, !dbg !44
				; MEMPROF: call {{.}} @_Z3bazv{{.}} !callsite ![[C10:[0-9]+]]
				%call5 = call noundef ptr @_Z3bazv(), !dbg !45
				store ptr %call5, ptr %f, align 8, !dbg !46
				%0 = load ptr, ptr %a, align 8, !dbg !47
				call void @llvm.memset.p0.i64(ptr align 1 %0, i8 0, i64 10, i1 false), !dbg !48
				%1 = load ptr, ptr %b, align 8, !dbg !49
				call void @llvm.memset.p0.i64(ptr align 1 %1, i8 0, i64 10, i1 false), !dbg !50
				%2 = load ptr, ptr %c, align 8, !dbg !51
				call void @llvm.memset.p0.i64(ptr align 1 %2, i8 0, i64 10, i1 false), !dbg !52
				%3 = load ptr, ptr %d, align 8, !dbg !53
				call void @llvm.memset.p0.i64(ptr align 1 %3, i8 0, i64 10, i1 false), !dbg !54
				%4 = load ptr, ptr %e, align 8, !dbg !55
				call void @llvm.memset.p0.i64(ptr align 1 %4, i8 0, i64 10, i1 false), !dbg !56
				%5 = load ptr, ptr %f, align 8, !dbg !57
				call void @llvm.memset.p0.i64(ptr align 1 %5, i8 0, i64 10, i1 false), !dbg !58
				%6 = load ptr, ptr %a, align 8, !dbg !59
				%isnull = icmp eq ptr %6, null, !dbg !60
				br i1 %isnull, label %delete.end, label %delete.notnull, !dbg !60

				delete.notnull: ; preds = %entry
				call void @_ZdaPv(ptr noundef %6) #7, !dbg !61
				br label %delete.end, !dbg !61

				delete.end: ; preds = %delete.notnull, %entry
				%7 = load ptr, ptr %c, align 8, !dbg !63
				%isnull6 = icmp eq ptr %7, null, !dbg !64
				br i1 %isnull6, label %delete.end8, label %delete.notnull7, !dbg !64

				delete.notnull7: ; preds = %delete.end
				call void @_ZdaPv(ptr noundef %7) #7, !dbg !65
				br label %delete.end8, !dbg !65

				delete.end8: ; preds = %delete.notnull7, %delete.end
				%call9 = call i32 @sleep(i32 noundef 200), !dbg !66
				%8 = load ptr, ptr %b, align 8, !dbg !67
				%isnull10 = icmp eq ptr %8, null, !dbg !68
				br i1 %isnull10, label %delete.end12, label %delete.notnull11, !dbg !68

				delete.notnull11: ; preds = %delete.end8
				call void @_ZdaPv(ptr noundef %8) #7, !dbg !69
				br label %delete.end12, !dbg !69

				delete.end12: ; preds = %delete.notnull11, %delete.end8
				%9 = load ptr, ptr %d, align 8, !dbg !70
				%isnull13 = icmp eq ptr %9, null, !dbg !71
				br i1 %isnull13, label %delete.end15, label %delete.notnull14, !dbg !71

				delete.notnull14: ; preds = %delete.end12
				call void @_ZdaPv(ptr noundef %9) #7, !dbg !72
				br label %delete.end15, !dbg !72

				delete.end15: ; preds = %delete.notnull14, %delete.end12
				%10 = load ptr, ptr %e, align 8, !dbg !73
				%isnull16 = icmp eq ptr %10, null, !dbg !74
				br i1 %isnull16, label %delete.end18, label %delete.notnull17, !dbg !74

				delete.notnull17: ; preds = %delete.end15
				call void @_ZdaPv(ptr noundef %10) #7, !dbg !75
				br label %delete.end18, !dbg !75

				delete.end18: ; preds = %delete.notnull17, %delete.end15
				%11 = load ptr, ptr %f, align 8, !dbg !76
				%isnull19 = icmp eq ptr %11, null, !dbg !77
				br i1 %isnull19, label %delete.end21, label %delete.notnull20, !dbg !77

				delete.notnull20: ; preds = %delete.end18
				call void @_ZdaPv(ptr noundef %11) #7, !dbg !78
				br label %delete.end21, !dbg !78

				delete.end21: ; preds = %delete.notnull20, %delete.end18
				store i32 0, ptr %i, align 4, !dbg !79
				br label %for.cond, !dbg !80

				for.cond: ; preds = %for.inc, %delete.end21
				%12 = load i32, ptr %i, align 4, !dbg !81
				%cmp = icmp ult i32 %12, 2, !dbg !82
				br i1 %cmp, label %for.body, label %for.end, !dbg !83

				for.body: ; preds = %for.cond
				%13 = load i32, ptr %i, align 4, !dbg !84
				%add = add i32 %13, 3, !dbg !85
				; MEMPROF: call {{.}} @_Z7recursej{{.}} !callsite ![[C11:[0-9]+]]
				%call22 = call noundef ptr @_Z7recursej(i32 noundef %add), !dbg !86
				store ptr %call22, ptr %g, align 8, !dbg !87
				%14 = load ptr, ptr %g, align 8, !dbg !88
				call void @llvm.memset.p0.i64(ptr align 1 %14, i8 0, i64 10, i1 false), !dbg !89
				%15 = load i32, ptr %i, align 4, !dbg !90
				%tobool = icmp ne i32 %15, 0, !dbg !90
				br i1 %tobool, label %if.end, label %if.then, !dbg !91

				if.then: ; preds = %for.body
				%call23 = call i32 @sleep(i32 noundef 200), !dbg !92
				br label %if.end, !dbg !92

				if.end: ; preds = %if.then, %for.body
				%16 = load ptr, ptr %g, align 8, !dbg !93
				%isnull24 = icmp eq ptr %16, null, !dbg !94
				br i1 %isnull24, label %delete.end26, label %delete.notnull25, !dbg !94

				delete.notnull25: ; preds = %if.end
				call void @_ZdaPv(ptr noundef %16) #7, !dbg !95
				br label %delete.end26, !dbg !95

				delete.end26: ; preds = %delete.notnull25, %if.end
				br label %for.inc, !dbg !96

				for.inc: ; preds = %delete.end26
				%17 = load i32, ptr %i, align 4, !dbg !97
				%inc = add i32 %17, 1, !dbg !97
				store i32 %inc, ptr %i, align 4, !dbg !97
				br label %for.cond, !dbg !99, !llvm.loop !100

				for.end: ; preds = %for.cond
				ret i32 0, !dbg !103
				}

				; MEMPROF: #[[A1]] = { builtin allocsize(0) "memprof"="notcold" }
				; MEMPROF: #[[A2]] = { builtin allocsize(0) "memprof"="cold" }
				; MEMPROF: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]], ![[MIB5:[0-9]+]]}
				; MEMPROF: ![[MIB1]] = !{![[STACK1:[0-9]+]], !"notcold"}
				; MEMPROF: ![[STACK1]] = !{i64 -2458008693472584243, i64 3952224878458323, i64 -6408471049535768163, i64 -6408471049535768163, i64 -6408471049535768163, i64 -6408471049535768163}
				; MEMPROF: ![[MIB2]] = !{![[STACK2:[0-9]+]], !"cold"}
				; MEMPROF: ![[STACK2]] = !{i64 -2458008693472584243, i64 3952224878458323, i64 -6408471049535768163, i64 -6408471049535768163, i64 -6408471049535768163, i64 -2523213715586649525}
				; MEMPROF: ![[MIB3]] = !{![[STACK3:[0-9]+]], !"cold"}
				; MEMPROF: ![[STACK3]] = !{i64 -2458008693472584243, i64 4060711043150162853}
				; MEMPROF: ![[MIB4]] = !{![[STACK4:[0-9]+]], !"notcold"}
				; MEMPROF: ![[STACK4]] = !{i64 -2458008693472584243, i64 6197270713521362189}
				; MEMPROF: ![[MIB5]] = !{![[STACK5:[0-9]+]], !"cold"}
				; MEMPROF: ![[STACK5]] = !{i64 -2458008693472584243, i64 -8079659623765193173}
				; MEMPROF: ![[C1]] = !{i64 -2458008693472584243}
				; MEMPROF: ![[C2]] = !{i64 -8079659623765193173}
				; MEMPROF: ![[C3]] = !{i64 -972865200055133905}
				; MEMPROF: ![[C4]] = !{i64 -4805294506621015872}
				; MEMPROF: ![[C5]] = !{i64 3952224878458323}
				; MEMPROF: ![[C6]] = !{i64 -6408471049535768163}
				; MEMPROF: ![[C7]] = !{i64 6197270713521362189}
				; MEMPROF: ![[C8]] = !{i64 4060711043150162853}
				; MEMPROF: ![[C9]] = !{i64 1503792662459039327}
				; MEMPROF: ![[C10]] = !{i64 -1910610273966575552}
				; MEMPROF: ![[C11]] = !{i64 -2523213715586649525}

				; Function Attrs: argmemonly nofree nounwind willreturn writeonly
				declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #3

				; Function Attrs: nobuiltin nounwind
				declare void @_ZdaPv(ptr noundef) #4

				declare i32 @sleep(i32 noundef) #5

				attributes #0 = { mustprogress noinline optnone uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #1 = { nobuiltin allocsize(0) "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #2 = { mustprogress noinline norecurse optnone uwtable "disable-tail-calls"="true" "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #3 = { argmemonly nofree nounwind willreturn writeonly }
				attributes #4 = { nobuiltin nounwind "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #5 = { "disable-tail-calls"="true" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #6 = { builtin allocsize(0) }
				attributes #7 = { builtin nounwind }

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!2, !3, !4, !5, !6, !7, !8}
				!llvm.ident = !{!9}

				!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1, producer: "clang version 15.0.0 (https://github.com/llvm/llvm-project.git 6cbe6284d1f0a088b5c6482ae27b738f03d82fe7)", isOptimized: false, runtimeVersion: 0, emissionKind: LineTablesOnly, splitDebugInlining: false, debugInfoForProfiling: true, nameTableKind: None)
				!1 = !DIFile(filename: "memprof.cc", directory: "/usr/local/google/home/tejohnson/llvm/tmp", checksumkind: CSK_MD5, checksum: "e8c40ebe4b21776b4d60e9632cbc13c2")
				!2 = !{i32 7, !"Dwarf Version", i32 5}
				!3 = !{i32 2, !"Debug Info Version", i32 3}
				!4 = !{i32 1, !"wchar_size", i32 4}
				!5 = !{i32 7, !"PIC Level", i32 2}
				!6 = !{i32 7, !"PIE Level", i32 2}
				!7 = !{i32 7, !"uwtable", i32 2}
				!8 = !{i32 7, !"frame-pointer", i32 2}
				!9 = !{!"clang version 15.0.0 (https://github.com/llvm/llvm-project.git 6cbe6284d1f0a088b5c6482ae27b738f03d82fe7)"}
				!10 = distinct !DISubprogram(name: "foo", linkageName: "_Z3foov", scope: !1, file: !1, line: 4, type: !11, scopeLine: 4, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !12)
				!11 = !DISubroutineType(types: !12)
				!12 = !{}
				!13 = !DILocation(line: 5, column: 10, scope: !10)
				!14 = !DILocation(line: 5, column: 3, scope: !10)
				!15 = distinct !DISubprogram(name: "foo2", linkageName: "_Z4foo2v", scope: !1, file: !1, line: 7, type: !11, scopeLine: 7, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !12)
				!16 = !DILocation(line: 8, column: 10, scope: !15)
				!17 = !DILocation(line: 8, column: 3, scope: !15)
				!18 = distinct !DISubprogram(name: "bar", linkageName: "_Z3barv", scope: !1, file: !1, line: 10, type: !11, scopeLine: 10, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !12)
				!19 = !DILocation(line: 11, column: 10, scope: !18)
				!20 = !DILocation(line: 11, column: 3, scope: !18)
				!21 = distinct !DISubprogram(name: "baz", linkageName: "_Z3bazv", scope: !1, file: !1, line: 13, type: !11, scopeLine: 13, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !12)
				!22 = !DILocation(line: 14, column: 10, scope: !21)
				!23 = !DILocation(line: 14, column: 3, scope: !21)
				!24 = distinct !DISubprogram(name: "recurse", linkageName: "_Z7recursej", scope: !1, file: !1, line: 16, type: !11, scopeLine: 16, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !12)
				!25 = !DILocation(line: 17, column: 8, scope: !24)
				!26 = !DILocation(line: 17, column: 7, scope: !24)
				!27 = !DILocation(line: 18, column: 12, scope: !24)
				!28 = !DILocation(line: 18, column: 5, scope: !24)
				!29 = !DILocation(line: 19, column: 18, scope: !24)
				!30 = !DILocation(line: 19, column: 19, scope: !24)
				!31 = !DILocation(line: 19, column: 10, scope: !24)
				!32 = !DILocation(line: 19, column: 3, scope: !24)
				!33 = !DILocation(line: 20, column: 1, scope: !24)
				!34 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 21, type: !11, scopeLine: 21, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !12)
				!35 = !DILocation(line: 25, column: 13, scope: !34)
				!36 = !DILocation(line: 25, column: 9, scope: !34)
				!37 = !DILocation(line: 26, column: 13, scope: !34)
				!38 = !DILocation(line: 26, column: 9, scope: !34)
				!39 = !DILocation(line: 27, column: 13, scope: !34)
				!40 = !DILocation(line: 27, column: 9, scope: !34)
				!41 = !DILocation(line: 28, column: 13, scope: !34)
				!42 = !DILocation(line: 28, column: 9, scope: !34)
				!43 = !DILocation(line: 29, column: 13, scope: !34)
				!44 = !DILocation(line: 29, column: 9, scope: !34)
				!45 = !DILocation(line: 30, column: 13, scope: !34)
				!46 = !DILocation(line: 30, column: 9, scope: !34)
				!47 = !DILocation(line: 31, column: 10, scope: !34)
				!48 = !DILocation(line: 31, column: 3, scope: !34)
				!49 = !DILocation(line: 32, column: 10, scope: !34)
				!50 = !DILocation(line: 32, column: 3, scope: !34)
				!51 = !DILocation(line: 33, column: 10, scope: !34)
				!52 = !DILocation(line: 33, column: 3, scope: !34)
				!53 = !DILocation(line: 34, column: 10, scope: !34)
				!54 = !DILocation(line: 34, column: 3, scope: !34)
				!55 = !DILocation(line: 35, column: 10, scope: !34)
				!56 = !DILocation(line: 35, column: 3, scope: !34)
				!57 = !DILocation(line: 36, column: 10, scope: !34)
				!58 = !DILocation(line: 36, column: 3, scope: !34)
				!59 = !DILocation(line: 38, column: 12, scope: !34)
				!60 = !DILocation(line: 38, column: 3, scope: !34)
				!61 = !DILocation(line: 38, column: 3, scope: !62)
				!62 = !DILexicalBlockFile(scope: !34, file: !1, discriminator: 2)
				!63 = !DILocation(line: 39, column: 12, scope: !34)
				!64 = !DILocation(line: 39, column: 3, scope: !34)
				!65 = !DILocation(line: 39, column: 3, scope: !62)
				!66 = !DILocation(line: 41, column: 3, scope: !34)
				!67 = !DILocation(line: 42, column: 12, scope: !34)
				!68 = !DILocation(line: 42, column: 3, scope: !34)
				!69 = !DILocation(line: 42, column: 3, scope: !62)
				!70 = !DILocation(line: 43, column: 12, scope: !34)
				!71 = !DILocation(line: 43, column: 3, scope: !34)
				!72 = !DILocation(line: 43, column: 3, scope: !62)
				!73 = !DILocation(line: 44, column: 12, scope: !34)
				!74 = !DILocation(line: 44, column: 3, scope: !34)
				!75 = !DILocation(line: 44, column: 3, scope: !62)
				!76 = !DILocation(line: 45, column: 12, scope: !34)
				!77 = !DILocation(line: 45, column: 3, scope: !34)
				!78 = !DILocation(line: 45, column: 3, scope: !62)
				!79 = !DILocation(line: 51, column: 17, scope: !34)
				!80 = !DILocation(line: 51, column: 8, scope: !34)
				!81 = !DILocation(line: 51, column: 24, scope: !62)
				!82 = !DILocation(line: 51, column: 26, scope: !62)
				!83 = !DILocation(line: 51, column: 3, scope: !62)
				!84 = !DILocation(line: 52, column: 23, scope: !34)
				!85 = !DILocation(line: 52, column: 25, scope: !34)
				!86 = !DILocation(line: 52, column: 15, scope: !34)
				!87 = !DILocation(line: 52, column: 11, scope: !34)
				!88 = !DILocation(line: 53, column: 12, scope: !34)
				!89 = !DILocation(line: 53, column: 5, scope: !34)
				!90 = !DILocation(line: 54, column: 10, scope: !34)
				!91 = !DILocation(line: 54, column: 9, scope: !34)
				!92 = !DILocation(line: 55, column: 7, scope: !34)
				!93 = !DILocation(line: 56, column: 14, scope: !34)
				!94 = !DILocation(line: 56, column: 5, scope: !34)
				!95 = !DILocation(line: 56, column: 5, scope: !62)
				!96 = !DILocation(line: 57, column: 3, scope: !34)
				!97 = !DILocation(line: 51, column: 32, scope: !98)
				!98 = !DILexicalBlockFile(scope: !34, file: !1, discriminator: 4)
				!99 = !DILocation(line: 51, column: 3, scope: !98)
				!100 = distinct !{!100, !101, !96, !102}
				!101 = !DILocation(line: 51, column: 3, scope: !34)
				!102 = !{!"llvm.loop.mustprogress"}
				!103 = !DILocation(line: 58, column: 3, scope: !34)

llvm/test/Transforms/PGOProfile/memprofmissingfunc.ll

This file was added.

				;; Tests that we get a missing memprof error for a function not in profile when
				;; using -pgo-warn-missing-function.

				;; TODO: Use text profile inputs once that is available for memprof.

				;; The raw profiles have been generated from the source used for the memprof.ll
				;; test (see comments at the top of that file).

				; RUN: llvm-profdata merge %S/Inputs/memprof.memprofraw --profiled-binary %S/Inputs/memprof.exe -o %t.memprofdata

				; RUN: opt < %s -passes=pgo-instr-use -pgo-test-profile-file=%t.memprofdata -pgo-warn-missing-function -S 2>&1 \| FileCheck %s

				; CHECK: memprof record not found for function hash {{.*}} _Z16funcnotinprofilev

				snehasishUnsubmitted Done Reply Inline Actions Should we use a regex here to make it more resilient since we don't care about the exact hash? snehasish: Should we use a regex here to make it more resilient since we don't care about the exact hash?
				; ModuleID = 'memprofmissingfunc.cc'
				source_filename = "memprofmissingfunc.cc"
				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: mustprogress noinline nounwind optnone uwtable
				define dso_local void @_Z16funcnotinprofilev() {
				entry:
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[MemProf] Memprof profile matching and annotationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 462269

clang/lib/Frontend/CompilerInvocation.cpp

clang/test/CodeGen/Inputs/memprof.exe

clang/test/CodeGen/Inputs/memprof.memprofraw

clang/test/CodeGen/memprof.cpp

llvm/include/llvm/Analysis/MemoryBuiltins.h

llvm/include/llvm/ProfileData/InstrProfReader.h

llvm/lib/Analysis/MemoryBuiltins.cpp

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp

llvm/test/Transforms/PGOProfile/Inputs/memprof.exe

llvm/test/Transforms/PGOProfile/Inputs/memprof.memprofraw

llvm/test/Transforms/PGOProfile/Inputs/memprof_pgo.profraw

llvm/test/Transforms/PGOProfile/memprof.ll

llvm/test/Transforms/PGOProfile/memprofmissingfunc.ll

[MemProf] Memprof profile matching and annotation
ClosedPublic