This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/ProfileData/
-
llvm/
-
ProfileData/
-
SampleProf.h
-
lib/ProfileData/
-
ProfileData/
-
ProfileSummaryBuilder.cpp
-
SampleProf.cpp
-
test/
-
Transforms/SampleProfile/Inputs/
-
SampleProfile/
-
Inputs/
-
pseudo-probe-inline.prof
-
tools/llvm-profgen/
-
llvm-profgen/
-
fname-canonicalization.test
-
inline-cs-dangling-pseudoprobe.test
-
inline-cs-pseudoprobe.test
-
merge-cold-profile.test
-
noinline-cs-pseudoprobe.test
-
truncated-pseudoprobe.test
-
tools/llvm-profgen/
-
llvm-profgen/
3/7
ProfileGenerator.cpp

Differential D104129

[CSSPGO] Report zero-count probe in profile instead of dangling probes.
ClosedPublic

Authored by hoy on Jun 11 2021, 9:42 AM.

Download Raw Diff

Details

Reviewers

wenlei
wlei
wmi

Commits

rGcef9b96b01b7: [CSSPGO] Report zero-count probe in profile instead of dangling probes.

Summary

Previously dangling samples were represented by INT64_MAX in sample profile while probes never executed were not reported. This was based on an observation that dangling probes were only at a smaller portion than zero-count probes. However, with compiler optimizations, dangling probes end up becoming at large portion of all probes in general and reporting them does not make sense from profile size point of view. This change flips sample reporting by reporting zero-count probes instead. This enabled dangling probe to be represented by none (missing entry in profile). This has a couple benefits:

Reducing sample profile size in optimize mode, even when the number of non-executed probes outperform the number of dangling probes, since INT64_MAX takes more space over 0 to encode.

Binary size savings. No need to encode dangling probe anymore, since missing probes are treated as dangling in the profile reader.

Reducing compiler work to track dangling probes. However, for probes that are real dead and removed, we still need the compiler to identify them so that they can be reported as zero-count, instead of mistreated as dangling probes.

Improving counts quality by respecting the counts already collected on the non-dangling copy of a probe. A probe, when duplicated, gets two copies at runtime. If one of them is dangling while the other is not, merging the two probes at profile generation time will cause the real samples collected on the non-dangling one to be discarded. Not reporting the dangling counterpart will keep the real samples.

Better readability.

Be consistent with non-CS dwarf line number based profile. Zero counts are trusted by the compiler counts inferencer while missing counts will be inferred by the compiler.

Note that the current patch does include any work for #3. There will be follow-up changes.

For #1, I've seen for a large Facebook service, the text profile is reduced by 7%. For extbinary profile, the size of LBRProfileSection is reduced by 35%.

For #4, I have seen general counts quality for SPEC2017 is improved by 10%.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hoy created this revision.Jun 11 2021, 9:42 AM

Herald added a subscriber: wenlei. · View Herald TranscriptJun 11 2021, 9:42 AM

hoy requested review of this revision.Jun 11 2021, 9:42 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 11 2021, 9:42 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

hoy added reviewers: wenlei, wlei, wmi.Jun 11 2021, 9:44 AM

hoy added a subscriber: spupyrev.

Will come back with more numbers.

Harbormaster completed remote builds in B108835: Diff 351480.Jun 11 2021, 10:21 AM

It is strange that representation of probes affects counts quality (and hence, generated binaries). Is this a temporary state (that hopefully will be improved in the future)?

In general I agree that the suggested representation (0 for zero-counts, missing for dangles) is way more readable, and thus, less-error-prone.

In D104129#2817282, @spupyrev wrote:

It is strange that representation of probes affects counts quality (and hence, generated binaries). Is this a temporary state (that hopefully will be improved in the future)?

In general I agree that the suggested representation (0 for zero-counts, missing for dangles) is way more readable, and thus, less-error-prone.

The current change affects counts quality in that

Reporting more dangling probes than real dead probes. I believe we haven't spotted all places in the compiler that can remove dangling probes, and in those cases dangle probes were treated as real dead. With the current patch, more probes including some real dead probes are reported as dangling. This could be good or bad, depending on benchmarks. Current results show this is better in general. We're thinking about identifying real dead probes in the compiler and specially mark them as the next step.

Reporting what's already collected on the non-dangling sibling of a dangling probe. Previously reporting the danglingness of a probe caused real samples collected on other non-dangling copies of the same probe to be lost.

#2 can be improved by reporting both danglingness and real samples in future, if that's helpful We would need to figure out how to encode them together.

I collected more data for this change. For SPEC2017, besides the counts quality gain, I've also seen a 0.4% perf gain on average, mainly from 602.gcc_s, 623.xalancbmk_s and 625.x264_s. For cinder, a 8% counts quality gain with a neutral perf result.

Updating D104129: [CSSPGO] Report zero-count probe in profile instead of dangling probes.

Harbormaster completed remote builds in B109181: Diff 351974.Jun 14 2021, 2:36 PM

Thanks for working on this. Among all deleted probes, if the majority of them are actually dangling (i.e removed due to hoisting, merging like optimization as opposed to dce), it makes sense to default to dangling and explicitly marking non-dangling. Previously we assumed that majority of deleted probes are not dangled, hence we choose to mark dangled ones explicitly.

It seems that our assumption has been wrong, which is why this change actually improves profile quality.

This enabled dangling probe to be represented by both INT64_MAX and none (missing entry in profile).

Why do we still want to allow INT64_MAX representation of dangling probe? Would it be cleaner to only use missing probe to represent dangled probes?

Reducing compiler work to track dangling probes. However, for probes that are real dead and removed, we still need the compiler to identify them so that they can be reported as zero-count, instead of mistreated as dangling probes.

While explicitly marking real dead probes can be a follow up patch. Should we remove moveAndDanglePseudoProbes completely together with this patch?

Moreover, since we treat missing probe the same as dangling probe in getProbeWeight and hence in profile inference, it seems we no longer need the concept of dangling. All we need is unknown (missing) vs known (explicitly marked 0 in the future for dead code, or probe with a real non-zero count). The term dangling come from the pass1 optimization side, dangle reflects how a probe looks after its containing block is removed, now if we don't need to track such probes from pass1 optimization side, it seems the whole concept of dangling can go away to simply things?

llvm/tools/llvm-profgen/ProfileGenerator.cpp
560	Mainly for saving profile size can be a misleading comment. The main benefits we see are: 1) better profile quality when we default all missing probe to be unknown (vs previously we only treat marked probes as unknown), since we have more unknown probes than dead probes. 2) allowing probe with count to take precedence over dangling ones when merging. If doing this regress profile quality, we probably won't do it even if it leads to smaller profile size. If we go further down this route, we may end up removing `InvalidProbeCount` altogether, then saving profile size can be confusing to others as others wouldn't know where we came from.
564–565	Not that we don't do anything for dangling probes, can we remove the `moveAndDanglePseudoProbes` from compiler too? These will be missing, and not having `0` without any special handling. Furthermore, this is the only place we generate `InvalidProbeCount`, with this change, all the special casing for `InvalidProbeCount` can be removed too?

Be consistent with non-probe profile.

AutoFDO profile usually does not fill in zeros. counts below a certain threshold is omitted to save profile size.

Now in order to differentiate from unknown, the approach taken here is to always mark any known count including zero. In that sense this is different from AutoFDO (actually before this change, dangling probes are marked so zero counts can be omitted which is closer to AutoFDO for representing sparse profile).

Improving counts quality by respecting the counts already collected on the non-dangling sibling of the danling probe

nit on the wording, sibling leads others to think it's a different probe under the same probe inline tree. Here, it's really just copies of the same probe (probes sharing the same Id).

In D104129#2818629, @wenlei wrote:

Thanks for working on this. Among all deleted probes, if the majority of them are actually dangling (i.e removed due to hoisting, merging like optimization as opposed to dce), it makes sense to default to dangling and explicitly marking non-dangling. Previously we assumed that majority of deleted probes are not dangled, hence we choose to mark dangled ones explicitly.

It seems that our assumption has been wrong, which is why this change actually improves profile quality.

This enabled dangling probe to be represented by both INT64_MAX and none (missing entry in profile).

Why do we still want to allow INT64_MAX representation of dangling probe? Would it be cleaner to only use missing probe to represent dangled probes?

Yes, I was thinking about to have it completely removed separately. With this patch, we can still load profiles with the INT64_MAX annotation.

Reducing compiler work to track dangling probes. However, for probes that are real dead and removed, we still need the compiler to identify them so that they can be reported as zero-count, instead of mistreated as dangling probes.

While explicitly marking real dead probes can be a follow up patch. Should we remove moveAndDanglePseudoProbes completely together with this patch?

I plan to do it separately. Besides removing moveAndDanglePseudoProbes, we also need to remove dangling probes from empty blocks, i.e, the code in pseudo-probe-inserter.cpp.

Moreover, since we treat missing probe the same as dangling probe in getProbeWeight and hence in profile inference, it seems we no longer need the concept of dangling. All we need is unknown (missing) vs known (explicitly marked 0 in the future for dead code, or probe with a real non-zero count). The term dangling come from the pass1 optimization side, dangle reflects how a probe looks after its containing block is removed, now if we don't need to track such probes from pass1 optimization side, it seems the whole concept of dangling can go away to simply things?

Yes, we do not need the concept of dangling probes anymore, instead, we'll need the concept of real dead probes. The check for dangling probe in pass1 will be replaced with checking for real dead probes.

llvm/tools/llvm-profgen/ProfileGenerator.cpp
560	Sounds good. Will update comment.
564–565	Yes, planning to do it separately. There was the original patch that introduced dangling probe. Hopefully we can just revert it. What do you think?

Addressing Wenlei's comment.

Harbormaster completed remote builds in B109319: Diff 352170.Jun 15 2021, 10:36 AM

Removing InvalidProbeCount definition and uses.

Herald added subscribers: hiraditya, eraman. · View Herald TranscriptJun 15 2021, 3:53 PM

Harbormaster completed remote builds in B109402: Diff 352277.Jun 15 2021, 7:11 PM

In D104129#2819438, @wenlei wrote:

Be consistent with non-probe profile.

AutoFDO profile usually does not fill in zeros. counts below a certain threshold is omitted to save profile size.

Now in order to differentiate from unknown, the approach taken here is to always mark any known count including zero. In that sense this is different from AutoFDO (actually before this change, dangling probes are marked so zero counts can be omitted which is closer to AutoFDO for representing sparse profile).

Improving counts quality by respecting the counts already collected on the non-dangling sibling of the danling probe

nit on the wording, sibling leads others to think it's a different probe under the same probe inline tree. Here, it's really just copies of the same probe (probes sharing the same Id).

Please update description: 1) not using the term sibling probe, 2), remove or update "Be consistent with non-probe profile.". 3), update "be represented by both INT64_MAX and none (missing entry in profile)"

llvm/tools/llvm-profgen/ProfileGenerator.cpp
560	If you mention dangling in the comments, make sure it will be updated when we remove the concept of dangling altogether. What I actually meant is that we don't need to explain the benefit comparing to the old approach since the old approach will be completely gone. Just comment it as if we arrive at this from a clean slate, in which case there should be no place for dangling probe. The extra context is good for commit message, but it's not so relevant for comment. Here we're simply marking probes that don't have any sample hits with zero count so we can differentiate probe with known count from unknown (deleted or other reason).

hoy added inline comments.Jun 15 2021, 9:52 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
560	I can remove the dangling term now, but feel that we need another term for the missing probes, or to mention why zero is reported. Maybe something like "Reporting zero for non-executed probes. The compiler will infer counts for deleted probes"?

wenlei added inline comments.Jun 15 2021, 10:11 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
560	Yeah something like "Explicitly assign zero count for remaining probes without sample hits. This is to differentiate from removed probes whose count is unknown and will be inferred in the compiler". Missing probe is probe optimized away, it should be intuitive. Not sure if we need a special term for that.. One thing is the removal of the term about dangling; the other is we don't need to explain all the benefit when the base will be gone.

hoy edited the summary of this revision. (Show Details)Jun 15 2021, 10:54 PM

Updating summary and comments.

Be consistent with dwarf line number based profile.

This is not accurate, see earlier comments. What i meant was not about "non-probe profile -> dwarf line number profile".

Otherwise lgtm. Thanks.

This revision is now accepted and ready to land.Jun 15 2021, 11:06 PM

In D104129#2821164, @wenlei wrote:

Be consistent with dwarf line number based profile.

This is not accurate, see earlier comments. What i meant was not about "non-probe profile -> dwarf line number profile".

Otherwise lgtm. Thanks.

It should be consistent with non-CS dwarf-based profile. CS dwarf-based profile doesn't have either zero or dangling reported, likely needs a fix. Is that what you meant?

In D104129#2821189, @hoy wrote:

In D104129#2821164, @wenlei wrote:

Be consistent with dwarf line number based profile.

This is not accurate, see earlier comments. What i meant was not about "non-probe profile -> dwarf line number profile".

Otherwise lgtm. Thanks.

It should be consistent with non-CS dwarf-based profile. CS dwarf-based profile doesn't have either zero or dangling reported, likely needs a fix. Is that what you meant?

Non-cs dwarf-based profile normally doesn't have zero counts in the profile. Any count below a certain threshold is omitted in the profile to save size. This is the default behavior for our production usage (I believe for google as well). This is why the current change that needs explicit zero is not consistent with afdo.

In D104129#2821201, @wenlei wrote:

In D104129#2821189, @hoy wrote:

In D104129#2821164, @wenlei wrote:

Be consistent with dwarf line number based profile.

This is not accurate, see earlier comments. What i meant was not about "non-probe profile -> dwarf line number profile".

Otherwise lgtm. Thanks.

It should be consistent with non-CS dwarf-based profile. CS dwarf-based profile doesn't have either zero or dangling reported, likely needs a fix. Is that what you meant?

Non-cs dwarf-based profile normally doesn't have zero counts in the profile. Any count below a certain threshold is omitted in the profile to save size. This is the default behavior for our production usage (I believe for google as well). This is why the current change that needs explicit zero is not consistent with afdo.

I see. I'm seeing zero counts in non-CS dwarf-based profile with our internal profile generation tool like below. I guess the production profile is generated with additional switches?

main:139344172:0
6: 0
9: 0
17: 0
18: 0
19: 0
20: 0
21: 0
23: 0
27: 0
28: 0
31: 0
32: 0
33: 0
34: 0
36: 0
41: 0
55: 0
64: 0
79: 0
84: 0
65411: 0
33: atoi:0

2: 0
65175: 0

39: read_min:27098

14: 0
17: 0

hoy edited the summary of this revision. (Show Details)Jun 15 2021, 11:51 PM

Curious why we can remove the moveAndDanglePseudoProbes(don't need the concept of dangling)? My understanding is we treat dangling as unknown, at pass1 we need to pass this information to the profiled binary. An optimized-out probe won't have samples, it will be treated as zero if not marked as dangling.

Harbormaster completed remote builds in B109455: Diff 352342.Jun 16 2021, 6:22 AM

In D104129#2821245, @wlei wrote:

Curious why we can remove the moveAndDanglePseudoProbes(don't need the concept of dangling)? My understanding is we treat dangling as unknown, at pass1 we need to pass this information to the profiled binary. An optimized-out probe won't have samples, it will be treated as zero if not marked as dangling.

We now use "missing" to represent dangling which is natural, and "missing" is different from zero. The counts inferencer will reason about counts for missing probes. We will flip the concept of dangling to real dead, and likely make a routine like moveAndKillPseudoProbes.

In D104129#2822242, @hoy wrote:

In D104129#2821245, @wlei wrote:

Curious why we can remove the moveAndDanglePseudoProbes(don't need the concept of dangling)? My understanding is we treat dangling as unknown, at pass1 we need to pass this information to the profiled binary. An optimized-out probe won't have samples, it will be treated as zero if not marked as dangling.

We now use "missing" to represent dangling which is natural, and "missing" is different from zero. The counts inferencer will reason about counts for missing probes. We will flip the concept of dangling to real dead, and likely make a routine like moveAndKillPseudoProbes.

I see, thanks for the clarification!

LGTM, thanks!

Fixing clang-tidy issue.

In D104129#2821201, @wenlei wrote:

In D104129#2821189, @hoy wrote:

In D104129#2821164, @wenlei wrote:

Be consistent with dwarf line number based profile.

This is not accurate, see earlier comments. What i meant was not about "non-probe profile -> dwarf line number profile".

Otherwise lgtm. Thanks.

It should be consistent with non-CS dwarf-based profile. CS dwarf-based profile doesn't have either zero or dangling reported, likely needs a fix. Is that what you meant?

Non-cs dwarf-based profile normally doesn't have zero counts in the profile. Any count below a certain threshold is omitted in the profile to save size. This is the default behavior for our production usage (I believe for google as well). This is why the current change that needs explicit zero is not consistent with afdo.

Google keeps zero counts in the profile. It treats missing lines conservatively and tries to infer their hotness. I vaguely remember there is a flag added to control it.

In "6. Be consistent with non-CS dwarf line number based profile", if you can mention how the compiler treat missing line number and missing pseudo in the description as a record, that will be helpful.

Google keeps zero counts in the profile. It treats missing lines conservatively and tries to infer their hotness. I vaguely remember there is a flag added to control it.

Thanks for pointing out. Actually Hongtao and I went back and checked our setup, we also keep lines with zero. I was confused with the threshold on total samples for function. Sorry about that.

In "6. Be consistent with non-CS dwarf line number based profile", if you can mention how the compiler treat missing line number and missing pseudo in the description as a record, that will be helpful.

+1.

hoy edited the summary of this revision. (Show Details)Jun 16 2021, 11:29 AM

Herald added a subscriber: JDevlieghere. · View Herald TranscriptJun 16 2021, 11:29 AM

This revision was landed with ongoing or failed builds.Jun 16 2021, 11:46 AM

Closed by commit rGcef9b96b01b7: [CSSPGO] Report zero-count probe in profile instead of dangling probes. (authored by hoy). · Explain Why

This revision was automatically updated to reflect the committed changes.

hoy added a commit: rGcef9b96b01b7: [CSSPGO] Report zero-count probe in profile instead of dangling probes..

Harbormaster completed remote builds in B109540: Diff 352469.Jun 16 2021, 10:21 PM

hoy mentioned this in D104477: [CSSPGO] Undoing the concept of dangling pseudo probe.Jun 17 2021, 11:21 AM

hoy mentioned this in rGbd5249551880: [CSSPGO] Undoing the concept of dangling pseudo probe.Jun 18 2021, 3:14 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

ProfileData/

SampleProf.h

20 lines

lib/

ProfileData/

ProfileSummaryBuilder.cpp

2 lines

SampleProf.cpp

11 lines

test/

Transforms/

SampleProfile/

Inputs/

pseudo-probe-inline.prof

6 lines

tools/

llvm-profgen/

fname-canonicalization.test

2 lines

inline-cs-dangling-pseudoprobe.test

7 lines

inline-cs-pseudoprobe.test

5 lines

merge-cold-profile.test

17 lines

noinline-cs-pseudoprobe.test

6 lines

truncated-pseudoprobe.test

6 lines

tools/

llvm-profgen/

ProfileGenerator.cpp

15 lines

Diff 352496

llvm/include/llvm/ProfileData/SampleProf.h

Show First 20 Lines • Show All 592 Lines • ▼ Show 20 Lines	public:
}		}

/// Return the number of samples collected at the given location.		/// Return the number of samples collected at the given location.
/// Each location is specified by \p LineOffset and \p Discriminator.		/// Each location is specified by \p LineOffset and \p Discriminator.
/// If the location is not found in profile, return error.		/// If the location is not found in profile, return error.
ErrorOr<uint64_t> findSamplesAt(uint32_t LineOffset,		ErrorOr<uint64_t> findSamplesAt(uint32_t LineOffset,
uint32_t Discriminator) const {		uint32_t Discriminator) const {
const auto &ret = BodySamples.find(LineLocation(LineOffset, Discriminator));		const auto &ret = BodySamples.find(LineLocation(LineOffset, Discriminator));
if (ret == BodySamples.end()) {		if (ret == BodySamples.end())
// For CSSPGO, in order to conserve profile size, we no longer write out
// locations profile for those not hit during training, so we need to
// treat them as zero instead of error here.
if (FunctionSamples::ProfileIsCS \|\| FunctionSamples::ProfileIsProbeBased)
return 0;
return std::error_code();
} else {
// Return error for an invalid sample count which is usually assigned to
// dangling probe.
if (FunctionSamples::ProfileIsProbeBased &&
ret->second.getSamples() == FunctionSamples::InvalidProbeCount)
return std::error_code();		return std::error_code();
return ret->second.getSamples();		return ret->second.getSamples();
}		}
}

/// Returns the call target map collected at a given location.		/// Returns the call target map collected at a given location.
/// Each location is specified by \p LineOffset and \p Discriminator.		/// Each location is specified by \p LineOffset and \p Discriminator.
/// If the location is not found in profile, return error.		/// If the location is not found in profile, return error.
ErrorOr<SampleRecord::CallTargetMap>		ErrorOr<SampleRecord::CallTargetMap>
findCallTargetMapAt(uint32_t LineOffset, uint32_t Discriminator) const {		findCallTargetMapAt(uint32_t LineOffset, uint32_t Discriminator) const {
const auto &ret = BodySamples.find(LineLocation(LineOffset, Discriminator));		const auto &ret = BodySamples.find(LineLocation(LineOffset, Discriminator));
if (ret == BodySamples.end())		if (ret == BodySamples.end())
▲ Show 20 Lines • Show All 260 Lines • ▼ Show 20 Lines	public:
///		///
/// \returns the FunctionSamples pointer to the inlined instance.		/// \returns the FunctionSamples pointer to the inlined instance.
/// If \p Remapper is not nullptr, it will be used to find matching		/// If \p Remapper is not nullptr, it will be used to find matching
/// FunctionSamples with not exactly the same but equivalent name.		/// FunctionSamples with not exactly the same but equivalent name.
const FunctionSamples *findFunctionSamples(		const FunctionSamples *findFunctionSamples(
const DILocation *DIL,		const DILocation *DIL,
SampleProfileReaderItaniumRemapper *Remapper = nullptr) const;		SampleProfileReaderItaniumRemapper *Remapper = nullptr) const;

// The invalid sample count is used to represent samples collected for a
// dangling probe.
static constexpr uint64_t InvalidProbeCount = UINT64_MAX;

static bool ProfileIsProbeBased;		static bool ProfileIsProbeBased;

static bool ProfileIsCS;		static bool ProfileIsCS;

SampleContext &getContext() const { return Context; }		SampleContext &getContext() const { return Context; }

void setContext(const SampleContext &FContext) { Context = FContext; }		void setContext(const SampleContext &FContext) { Context = FContext; }

▲ Show 20 Lines • Show All 162 Lines • Show Last 20 Lines

llvm/lib/ProfileData/ProfileSummaryBuilder.cpp

Show First 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	void SampleProfileSummaryBuilder::addRecord(
const sampleprof::FunctionSamples &FS, bool isCallsiteSample) {		const sampleprof::FunctionSamples &FS, bool isCallsiteSample) {
if (!isCallsiteSample) {		if (!isCallsiteSample) {
NumFunctions++;		NumFunctions++;
if (FS.getHeadSamples() > MaxFunctionCount)		if (FS.getHeadSamples() > MaxFunctionCount)
MaxFunctionCount = FS.getHeadSamples();		MaxFunctionCount = FS.getHeadSamples();
}		}
for (const auto &I : FS.getBodySamples()) {		for (const auto &I : FS.getBodySamples()) {
uint64_t Count = I.second.getSamples();		uint64_t Count = I.second.getSamples();
if (!sampleprof::FunctionSamples::ProfileIsProbeBased \|\|
(Count != sampleprof::FunctionSamples::InvalidProbeCount))
addCount(Count);		addCount(Count);
}		}
for (const auto &I : FS.getCallsiteSamples())		for (const auto &I : FS.getCallsiteSamples())
for (const auto &CS : I.second)		for (const auto &CS : I.second)
addRecord(CS.second, true);		addRecord(CS.second, true);
}		}

// The argument to this method is a vector of cutoff percentages and the return		// The argument to this method is a vector of cutoff percentages and the return
▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

llvm/lib/ProfileData/SampleProf.cpp

Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	raw_ostream &llvm::sampleprof::operator<<(raw_ostream &OS,
return OS;		return OS;
}		}

/// Merge the samples in \p Other into this record.		/// Merge the samples in \p Other into this record.
/// Optionally scale sample counts by \p Weight.		/// Optionally scale sample counts by \p Weight.
sampleprof_error SampleRecord::merge(const SampleRecord &Other,		sampleprof_error SampleRecord::merge(const SampleRecord &Other,
uint64_t Weight) {		uint64_t Weight) {
sampleprof_error Result;		sampleprof_error Result;
// With pseudo probes, merge a dangling sample with a non-dangling sample
// should result in a dangling sample.
if (FunctionSamples::ProfileIsProbeBased &&
(getSamples() == FunctionSamples::InvalidProbeCount \|\|
Other.getSamples() == FunctionSamples::InvalidProbeCount)) {
NumSamples = FunctionSamples::InvalidProbeCount;
Result = sampleprof_error::success;
} else {
Result = addSamples(Other.getSamples(), Weight);		Result = addSamples(Other.getSamples(), Weight);
}
for (const auto &I : Other.getCallTargets()) {		for (const auto &I : Other.getCallTargets()) {
MergeResult(Result, addCalledTarget(I.first(), I.second, Weight));		MergeResult(Result, addCalledTarget(I.first(), I.second, Weight));
}		}
return Result;		return Result;
}		}

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD void LineLocation::dump() const { print(dbgs()); }		LLVM_DUMP_METHOD void LineLocation::dump() const { print(dbgs()); }
▲ Show 20 Lines • Show All 293 Lines • Show Last 20 Lines

llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-inline.prof

	[foo]:23:23			[foo]:23:23
	1: 23			1: 23
	2: 23 zen:23			2: 23 zen:23
	!CFGChecksum: 281479271677951			!CFGChecksum: 281479271677951
	[foo:2 @ zen]:765858:23			[foo:2 @ zen]:765858:23
	1: 23			1: 23
	2: 382920			2: 382920
	3: 382915			3: 382915
				4: 0
				5: 0
				6: 0
	!CFGChecksum: 138828622701			!CFGChecksum: 138828622701
	[bar]:23:23			[bar]:23:23
	1: 23			1: 23
	2: 23 zen:23			2: 23 zen:23
	!CFGChecksum: 281479271677951			!CFGChecksum: 281479271677951
	[bar:2 @ zen]:765858:23			[bar:2 @ zen]:765858:23
	1: 23			1: 23
	2: 382920			2: 382920
	3: 382915			3: 382915
				4: 0
				5: 0
				6: 0
	!CFGChecksum: 138828622701			!CFGChecksum: 138828622701
	No newline at end of file			No newline at end of file

llvm/test/tools/llvm-profgen/fname-canonicalization.test

	Show All 16 Lines
	; CHECK-PROBE-FNAME: 2: 15			; CHECK-PROBE-FNAME: 2: 15
	; CHECK-PROBE-FNAME: 3: 15			; CHECK-PROBE-FNAME: 3: 15
	; CHECK-PROBE-FNAME: 4: 15			; CHECK-PROBE-FNAME: 4: 15
	; CHECK-PROBE-FNAME: 6: 15			; CHECK-PROBE-FNAME: 6: 15
	; CHECK-PROBE-FNAME: 8: 15 _ZL3barii.__uniq.26267048767521081047744692097241227776:15			; CHECK-PROBE-FNAME: 8: 15 _ZL3barii.__uniq.26267048767521081047744692097241227776:15
	; CHECK-PROBE-FNAME: !CFGChecksum: 563088904013236			; CHECK-PROBE-FNAME: !CFGChecksum: 563088904013236
	; CHECK-PROBE-FNAME:[main:2 @ foo:8 @ _ZL3barii.__uniq.26267048767521081047744692097241227776]:30:15			; CHECK-PROBE-FNAME:[main:2 @ foo:8 @ _ZL3barii.__uniq.26267048767521081047744692097241227776]:30:15
	; CHECK-PROBE-FNAME: 1: 15			; CHECK-PROBE-FNAME: 1: 15
	; CHECK-PROBE-FNAME: 2: 18446744073709551615
	; CHECK-PROBE-FNAME: 3: 18446744073709551615
	; CHECK-PROBE-FNAME: 4: 15			; CHECK-PROBE-FNAME: 4: 15
	; CHECK-PROBE-FNAME: !CFGChecksum: 72617220756			; CHECK-PROBE-FNAME: !CFGChecksum: 72617220756


	; Original code:			; Original code:
	; Dwarf: clang -O3 -funique-internal-linkage-names -g test.c -o a.out			; Dwarf: clang -O3 -funique-internal-linkage-names -g test.c -o a.out
	; Probe: clang -O3 -funique-internal-linkage-names -fexperimental-new-pass-manager -fuse-ld=lld -fpseudo-probe-for-profiling -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Xclang -mdisable-tail-calls -g test.c -o a.out			; Probe: clang -O3 -funique-internal-linkage-names -fexperimental-new-pass-manager -fuse-ld=lld -fpseudo-probe-for-profiling -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Xclang -mdisable-tail-calls -g test.c -o a.out

	Show All 20 Lines

llvm/test/tools/llvm-profgen/inline-cs-dangling-pseudoprobe.test

	; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/inline-cs-dangling-pseudoprobe.perfscript --binary=%S/Inputs/inline-cs-pseudoprobe.perfbin --output=%t --show-unwinder-output --profile-summary-cold-count=0 \| FileCheck %s --check-prefix=CHECK-UNWINDER			; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/inline-cs-dangling-pseudoprobe.perfscript --binary=%S/Inputs/inline-cs-pseudoprobe.perfbin --output=%t --show-unwinder-output --profile-summary-cold-count=0 \| FileCheck %s --check-prefix=CHECK-UNWINDER
	; RUN: FileCheck %s --input-file %t			; RUN: FileCheck %s --input-file %t

	; CHECK: [main:2 @ foo]:58:0			; CHECK: [main:2 @ foo]:58:0
				; CHECK-NEXT: 1: 0
	; CHECK-NEXT: 2: 15			; CHECK-NEXT: 2: 15
	; CHECK-NEXT: 3: 14			; CHECK-NEXT: 3: 14
				; CHECK-NEXT: 4: 0
	; CHECK-NEXT: 5: 14			; CHECK-NEXT: 5: 14
	; CHECK-NEXT: 6: 15			; CHECK-NEXT: 6: 15
				; CHECK-NEXT: 7: 0
				; CHECK-NEXT: 9: 0
	; CHECK-NEXT: !CFGChecksum: 138950591924			; CHECK-NEXT: !CFGChecksum: 138950591924
	; CHECK:[main:2 @ foo:8 @ bar]:1:0			; CHECK:[main:2 @ foo:8 @ bar]:1:0
	; CHECK-NEXT: 2: 18446744073709551615			; CHECK-NEXT: 1: 0
	; CHECK-NEXT: 3: 18446744073709551615
	; CHECK-NEXT: 4: 1			; CHECK-NEXT: 4: 1
	; CHECK-NEXT: !CFGChecksum: 72617220756			; CHECK-NEXT: !CFGChecksum: 72617220756


	; CHECK-UNWINDER: Binary(inline-cs-pseudoprobe.perfbin)'s Range Counter:			; CHECK-UNWINDER: Binary(inline-cs-pseudoprobe.perfbin)'s Range Counter:
	; CHECK-UNWINDER-EMPTY:			; CHECK-UNWINDER-EMPTY:
	; CHECK-UNWINDER-NEXT: (800, 82b): 14			; CHECK-UNWINDER-NEXT: (800, 82b): 14
	; CHECK-UNWINDER-NEXT: (84d, 858): 1			; CHECK-UNWINDER-NEXT: (84d, 858): 1
	Show All 31 Lines

llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test

	; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/inline-cs-pseudoprobe.perfscript --binary=%S/Inputs/inline-cs-pseudoprobe.perfbin --output=%t --show-unwinder-output --profile-summary-cold-count=0 \| FileCheck %s --check-prefix=CHECK-UNWINDER			; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/inline-cs-pseudoprobe.perfscript --binary=%S/Inputs/inline-cs-pseudoprobe.perfbin --output=%t --show-unwinder-output --profile-summary-cold-count=0 \| FileCheck %s --check-prefix=CHECK-UNWINDER
	; RUN: FileCheck %s --input-file %t			; RUN: FileCheck %s --input-file %t

	; CHECK: [main:2 @ foo]:74:0			; CHECK: [main:2 @ foo]:74:0
				; CHECK-NEXT: 1: 0
	; CHECK-NEXT: 2: 15			; CHECK-NEXT: 2: 15
	; CHECK-NEXT: 3: 15			; CHECK-NEXT: 3: 15
	; CHECK-NEXT: 4: 14			; CHECK-NEXT: 4: 14
	; CHECK-NEXT: 5: 1			; CHECK-NEXT: 5: 1
	; CHECK-NEXT: 6: 15			; CHECK-NEXT: 6: 15
				; CHECK-NEXT: 7: 0
	; CHECK-NEXT: 8: 14 bar:14			; CHECK-NEXT: 8: 14 bar:14
				; CHECK-NEXT: 9: 0
	; CHECK-NEXT: !CFGChecksum: 138950591924			; CHECK-NEXT: !CFGChecksum: 138950591924
	; CHECK:[main:2 @ foo:8 @ bar]:28:14			; CHECK:[main:2 @ foo:8 @ bar]:28:14
	; CHECK-NEXT: 1: 14			; CHECK-NEXT: 1: 14
	; CHECK-NEXT: 2: 18446744073709551615
	; CHECK-NEXT: 3: 18446744073709551615
	; CHECK-NEXT: 4: 14			; CHECK-NEXT: 4: 14
	; CHECK-NEXT: !CFGChecksum: 72617220756			; CHECK-NEXT: !CFGChecksum: 72617220756


	; CHECK-UNWINDER: Binary(inline-cs-pseudoprobe.perfbin)'s Range Counter:			; CHECK-UNWINDER: Binary(inline-cs-pseudoprobe.perfbin)'s Range Counter:
	; CHECK-UNWINDER-EMPTY:			; CHECK-UNWINDER-EMPTY:
	; CHECK-UNWINDER-NEXT: (800, 858): 1			; CHECK-UNWINDER-NEXT: (800, 858): 1
	; CHECK-UNWINDER-NEXT: (80e, 82b): 1			; CHECK-UNWINDER-NEXT: (80e, 82b): 1
	Show All 31 Lines

llvm/test/tools/llvm-profgen/merge-cold-profile.test

	Show All 10 Lines
	; RUN: FileCheck %s --input-file %t3 --check-prefix=CHECK-UNMERGED			; RUN: FileCheck %s --input-file %t3 --check-prefix=CHECK-UNMERGED

	; Test --csprof-frame-depth-for-cold-context			; Test --csprof-frame-depth-for-cold-context
	; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/recursion-compression-pseudoprobe.perfscript --binary=%S/Inputs/recursion-compression-pseudoprobe.perfbin --output=%t2 --compress-recursion=-1 --profile-summary-cold-count=100 --csprof-trim-cold-context=0 --csprof-frame-depth-for-cold-context=2			; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/recursion-compression-pseudoprobe.perfscript --binary=%S/Inputs/recursion-compression-pseudoprobe.perfbin --output=%t2 --compress-recursion=-1 --profile-summary-cold-count=100 --csprof-trim-cold-context=0 --csprof-frame-depth-for-cold-context=2
	; RUN: FileCheck %s --input-file %t2 --check-prefix=CHECK-COLD-CONTEXT-LENGTH			; RUN: FileCheck %s --input-file %t2 --check-prefix=CHECK-COLD-CONTEXT-LENGTH

	; CHECK: [fa]:14:4			; CHECK: [fa]:14:4
	; CHECK-NEXT: 1: 4			; CHECK-NEXT: 1: 4
	; CHECK-NEXT: 2: 18446744073709551615
	; CHECK-NEXT: 3: 4			; CHECK-NEXT: 3: 4
	; CHECK-NEXT: 4: 2			; CHECK-NEXT: 4: 2
	; CHECK-NEXT: 5: 1			; CHECK-NEXT: 5: 1
				; CHECK-NEXT: 6: 0
	; CHECK-NEXT: 7: 2 fb:2			; CHECK-NEXT: 7: 2 fb:2
	; CHECK-NEXT: 8: 1 fa:1			; CHECK-NEXT: 8: 1 fa:1
	; CHECK-NEXT: !CFGChecksum: 120515930909			; CHECK-NEXT: !CFGChecksum: 120515930909
	; CHECK-NEXT: !Attributes: 0			; CHECK-NEXT: !Attributes: 0
	; CHECK-NEXT:[main:2 @ foo:5 @ fa:8 @ fa:7 @ fb:5 @ fb]:13:4			; CHECK-NEXT:[main:2 @ foo:5 @ fa:8 @ fa:7 @ fb:5 @ fb]:13:4
	; CHECK-NEXT: 1: 4			; CHECK-NEXT: 1: 4
	; CHECK-NEXT: 2: 3			; CHECK-NEXT: 2: 3
	; CHECK-NEXT: 3: 1			; CHECK-NEXT: 3: 1
				; CHECK-NEXT: 4: 0
	; CHECK-NEXT: 5: 4 fb:4			; CHECK-NEXT: 5: 4 fb:4
	; CHECK-NEXT: 6: 1 fa:1			; CHECK-NEXT: 6: 1 fa:1
	; CHECK-NEXT: !CFGChecksum: 72617220756			; CHECK-NEXT: !CFGChecksum: 72617220756

	; CHECK-KEEP-COLD: [fb]:19:6			; CHECK-KEEP-COLD: [fb]:19:6
	; CHECK-KEEP-COLD-NEXT: 1: 6			; CHECK-KEEP-COLD-NEXT: 1: 6
	; CHECK-KEEP-COLD-NEXT: 2: 3			; CHECK-KEEP-COLD-NEXT: 2: 3
	; CHECK-KEEP-COLD-NEXT: 3: 3			; CHECK-KEEP-COLD-NEXT: 3: 3
				; CHECK-KEEP-COLD-NEXT: 4: 0
	; CHECK-KEEP-COLD-NEXT: 5: 4 fb:4			; CHECK-KEEP-COLD-NEXT: 5: 4 fb:4
	; CHECK-KEEP-COLD-NEXT: 6: 3 fa:3			; CHECK-KEEP-COLD-NEXT: 6: 3 fa:3
	; CHECK-KEEP-COLD-NEXT: !CFGChecksum: 72617220756			; CHECK-KEEP-COLD-NEXT: !CFGChecksum: 72617220756
	; CHECK-KEEP-COLD-NEXT: !Attributes: 0			; CHECK-KEEP-COLD-NEXT: !Attributes: 0
	; CHECK-KEEP-COLD-NEXT:[fa]:14:4			; CHECK-KEEP-COLD-NEXT:[fa]:14:4
	; CHECK-KEEP-COLD-NEXT: 1: 4			; CHECK-KEEP-COLD-NEXT: 1: 4
	; CHECK-KEEP-COLD-NEXT: 2: 18446744073709551615
	; CHECK-KEEP-COLD-NEXT: 3: 4			; CHECK-KEEP-COLD-NEXT: 3: 4
	; CHECK-KEEP-COLD-NEXT: 4: 2			; CHECK-KEEP-COLD-NEXT: 4: 2
	; CHECK-KEEP-COLD-NEXT: 5: 1			; CHECK-KEEP-COLD-NEXT: 5: 1
				; CHECK-KEEP-COLD-NEXT: 6: 0
	; CHECK-KEEP-COLD-NEXT: 7: 2 fb:2			; CHECK-KEEP-COLD-NEXT: 7: 2 fb:2
	; CHECK-KEEP-COLD-NEXT: 8: 1 fa:1			; CHECK-KEEP-COLD-NEXT: 8: 1 fa:1
	; CHECK-KEEP-COLD-NEXT: !CFGChecksum: 120515930909			; CHECK-KEEP-COLD-NEXT: !CFGChecksum: 120515930909

	; CHECK-UNMERGED: [main:2 @ foo:5 @ fa:8 @ fa:7 @ fb:5 @ fb]:13:4			; CHECK-UNMERGED: [main:2 @ foo:5 @ fa:8 @ fa:7 @ fb:5 @ fb]:13:4
	; CHECK-UNMERGED-NEXT: 1: 4			; CHECK-UNMERGED-NEXT: 1: 4
	; CHECK-UNMERGED-NEXT: 2: 3			; CHECK-UNMERGED-NEXT: 2: 3
	; CHECK-UNMERGED-NEXT: 3: 1			; CHECK-UNMERGED-NEXT: 3: 1
				; CHECK-UNMERGED-NEXT: 4: 0
	; CHECK-UNMERGED-NEXT: 5: 4 fb:4			; CHECK-UNMERGED-NEXT: 5: 4 fb:4
	; CHECK-UNMERGED-NEXT: 6: 1 fa:1			; CHECK-UNMERGED-NEXT: 6: 1 fa:1
	; CHECK-UNMERGED-NEXT: !CFGChecksum: 72617220756			; CHECK-UNMERGED-NEXT: !CFGChecksum: 72617220756
	; CHECK-UNMERGED-NOT: [fa]			; CHECK-UNMERGED-NOT: [fa]
	; CHECK-UNMERGED-NOT: [fb]			; CHECK-UNMERGED-NOT: [fb]

	; CHECK-COLD-CONTEXT-LENGTH: [fb:5 @ fb]:13:4			; CHECK-COLD-CONTEXT-LENGTH: [fb:5 @ fb]:13:4
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: 1: 4			; CHECK-COLD-CONTEXT-LENGTH-NEXT: 1: 4
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: 2: 3			; CHECK-COLD-CONTEXT-LENGTH-NEXT: 2: 3
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: 3: 1			; CHECK-COLD-CONTEXT-LENGTH-NEXT: 3: 1
				; CHECK-COLD-CONTEXT-LENGTH-NEXT: 4: 0
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: 5: 4 fb:4			; CHECK-COLD-CONTEXT-LENGTH-NEXT: 5: 4 fb:4
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: 6: 1 fa:1			; CHECK-COLD-CONTEXT-LENGTH-NEXT: 6: 1 fa:1
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: !CFGChecksum: 72617220756			; CHECK-COLD-CONTEXT-LENGTH-NEXT: !CFGChecksum: 72617220756
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: !Attributes: 0			; CHECK-COLD-CONTEXT-LENGTH-NEXT: !Attributes: 0
	; CHECK-COLD-CONTEXT-LENGTH-NEXT:[fb:6 @ fa]:10:3			; CHECK-COLD-CONTEXT-LENGTH-NEXT:[fb:6 @ fa]:10:3
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: 1: 3			; CHECK-COLD-CONTEXT-LENGTH-NEXT: 1: 3
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: 2: 18446744073709551615
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: 3: 3			; CHECK-COLD-CONTEXT-LENGTH-NEXT: 3: 3
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: 4: 1			; CHECK-COLD-CONTEXT-LENGTH-NEXT: 4: 1
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: 5: 1			; CHECK-COLD-CONTEXT-LENGTH-NEXT: 5: 1
				; CHECK-COLD-CONTEXT-LENGTH-NEXT: 6: 0
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: 7: 1 fb:1			; CHECK-COLD-CONTEXT-LENGTH-NEXT: 7: 1 fb:1
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: 8: 1 fa:1			; CHECK-COLD-CONTEXT-LENGTH-NEXT: 8: 1 fa:1
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: !CFGChecksum: 120515930909			; CHECK-COLD-CONTEXT-LENGTH-NEXT: !CFGChecksum: 120515930909
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: !Attributes: 0			; CHECK-COLD-CONTEXT-LENGTH-NEXT: !Attributes: 0
	; CHECK-COLD-CONTEXT-LENGTH-NEXT:[fa:7 @ fb]:6:2			; CHECK-COLD-CONTEXT-LENGTH-NEXT:[fa:7 @ fb]:6:2
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: 1: 2			; CHECK-COLD-CONTEXT-LENGTH-NEXT: 1: 2
				; CHECK-COLD-CONTEXT-LENGTH-NEXT: 2: 0
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: 3: 2			; CHECK-COLD-CONTEXT-LENGTH-NEXT: 3: 2
				; CHECK-COLD-CONTEXT-LENGTH-NEXT: 4: 0
				; CHECK-COLD-CONTEXT-LENGTH-NEXT: 5: 0
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: 6: 2 fa:2			; CHECK-COLD-CONTEXT-LENGTH-NEXT: 6: 2 fa:2
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: !CFGChecksum: 72617220756			; CHECK-COLD-CONTEXT-LENGTH-NEXT: !CFGChecksum: 72617220756
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: !Attributes: 0			; CHECK-COLD-CONTEXT-LENGTH-NEXT: !Attributes: 0
	; CHECK-COLD-CONTEXT-LENGTH-NEXT:[fa:8 @ fa]:4:1			; CHECK-COLD-CONTEXT-LENGTH-NEXT:[fa:8 @ fa]:4:1
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: 1: 1			; CHECK-COLD-CONTEXT-LENGTH-NEXT: 1: 1
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: 2: 18446744073709551615
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: 3: 1			; CHECK-COLD-CONTEXT-LENGTH-NEXT: 3: 1
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: 4: 1			; CHECK-COLD-CONTEXT-LENGTH-NEXT: 4: 1
				; CHECK-COLD-CONTEXT-LENGTH-NEXT: 5: 0
				; CHECK-COLD-CONTEXT-LENGTH-NEXT: 6: 0
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: 7: 1 fb:1			; CHECK-COLD-CONTEXT-LENGTH-NEXT: 7: 1 fb:1
				; CHECK-COLD-CONTEXT-LENGTH-NEXT: 8: 0
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: !CFGChecksum: 120515930909			; CHECK-COLD-CONTEXT-LENGTH-NEXT: !CFGChecksum: 120515930909
	; CHECK-COLD-CONTEXT-LENGTH-NEXT: !Attributes: 0			; CHECK-COLD-CONTEXT-LENGTH-NEXT: !Attributes: 0

	; clang -O3 -fexperimental-new-pass-manager -fuse-ld=lld -fpseudo-probe-for-profiling			; clang -O3 -fexperimental-new-pass-manager -fuse-ld=lld -fpseudo-probe-for-profiling
	; -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Xclang -mdisable-tail-calls			; -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Xclang -mdisable-tail-calls
	; -g test.c -o a.out			; -g test.c -o a.out

	; Copied from recursion-compression.test			; Copied from recursion-compression.test
	Show All 24 Lines

llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test

	; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/noinline-cs-pseudoprobe.perfscript --binary=%S/Inputs/noinline-cs-pseudoprobe.perfbin --output=%t --show-unwinder-output --profile-summary-cold-count=0 \| FileCheck %s --check-prefix=CHECK-UNWINDER			; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/noinline-cs-pseudoprobe.perfscript --binary=%S/Inputs/noinline-cs-pseudoprobe.perfbin --output=%t --show-unwinder-output --profile-summary-cold-count=0 \| FileCheck %s --check-prefix=CHECK-UNWINDER
	; RUN: FileCheck %s --input-file %t			; RUN: FileCheck %s --input-file %t

	; CHECK: [main:2 @ foo]:75:0			; CHECK: [main:2 @ foo]:75:0
				; CHECK-NEXT: 1: 0
	; CHECK-NEXT: 2: 15			; CHECK-NEXT: 2: 15
	; CHECK-NEXT: 3: 15			; CHECK-NEXT: 3: 15
	; CHECK-NEXT: 4: 15			; CHECK-NEXT: 4: 15
				; CHECK-NEXT: 5: 0
	; CHECK-NEXT: 6: 15			; CHECK-NEXT: 6: 15
				; CHECK-NEXT: 7: 0
	; CHECK-NEXT: 8: 15 bar:15			; CHECK-NEXT: 8: 15 bar:15
				; CHECK-NEXT: 9: 0
	; CHECK-NEXT: !CFGChecksum: 138950591924			; CHECK-NEXT: !CFGChecksum: 138950591924
	; CHECK:[main:2 @ foo:8 @ bar]:30:15			; CHECK:[main:2 @ foo:8 @ bar]:30:15
	; CHECK-NEXT: 1: 15			; CHECK-NEXT: 1: 15
	; CHECK-NEXT: 2: 18446744073709551615
	; CHECK-NEXT: 3: 18446744073709551615
	; CHECK-NEXT: 4: 15			; CHECK-NEXT: 4: 15
	; CHECK-NEXT: !CFGChecksum: 72617220756			; CHECK-NEXT: !CFGChecksum: 72617220756


	; CHECK-UNWINDER: Binary(noinline-cs-pseudoprobe.perfbin)'s Range Counter:			; CHECK-UNWINDER: Binary(noinline-cs-pseudoprobe.perfbin)'s Range Counter:
	; CHECK-UNWINDER-NEXT: main:2			; CHECK-UNWINDER-NEXT: main:2
	; CHECK-UNWINDER-NEXT: (79e, 7bf): 15			; CHECK-UNWINDER-NEXT: (79e, 7bf): 15
	; CHECK-UNWINDER-NEXT: (7c4, 7cf): 15			; CHECK-UNWINDER-NEXT: (7c4, 7cf): 15
	Show All 35 Lines

llvm/test/tools/llvm-profgen/truncated-pseudoprobe.test

	; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/truncated-pseudoprobe.perfscript --binary=%S/Inputs/truncated-pseudoprobe.perfbin --output=%t			; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/truncated-pseudoprobe.perfscript --binary=%S/Inputs/truncated-pseudoprobe.perfbin --output=%t
	; RUN: FileCheck %s --input-file %t			; RUN: FileCheck %s --input-file %t

	; CHECK: [foo]:75:0			; CHECK: [foo]:75:0
				; CHECK-NEXT: 1: 0
	; CHECK-NEXT: 2: 15			; CHECK-NEXT: 2: 15
	; CHECK-NEXT: 3: 15			; CHECK-NEXT: 3: 15
	; CHECK-NEXT: 4: 15			; CHECK-NEXT: 4: 15
				; CHECK-NEXT: 5: 0
	; CHECK-NEXT: 6: 15			; CHECK-NEXT: 6: 15
				; CHECK-NEXT: 7: 0
	; CHECK-NEXT: 8: 15 bar:15			; CHECK-NEXT: 8: 15 bar:15
				; CHECK-NEXT: 9: 0
	; CHECK-NEXT: !CFGChecksum: 563088904013236			; CHECK-NEXT: !CFGChecksum: 563088904013236
	; CHECK-NEXT: !Attributes: 0			; CHECK-NEXT: !Attributes: 0
	; CHECK: [foo:8 @ bar]:30:15			; CHECK: [foo:8 @ bar]:30:15
	; CHECK-NEXT: 1: 15			; CHECK-NEXT: 1: 15
	; CHECK-NEXT: 2: 18446744073709551615
	; CHECK-NEXT: 3: 18446744073709551615
	; CHECK-NEXT: 4: 15			; CHECK-NEXT: 4: 15
	; CHECK-NEXT: !CFGChecksum: 72617220756			; CHECK-NEXT: !CFGChecksum: 72617220756
	; CHECK-NEXT: !Attributes: 1			; CHECK-NEXT: !Attributes: 1

	; truncated-pseudoprobe.perfbin is from the following compile commands:			; truncated-pseudoprobe.perfbin is from the following compile commands:
	; llc -pseudo-probe-for-profiling truncated-pseudoprobe.ll -filetype=obj -o truncated-pseudoprobe.o			; llc -pseudo-probe-for-profiling truncated-pseudoprobe.ll -filetype=obj -o truncated-pseudoprobe.o
	; clang truncated-pseudoprobe.o -o truncated-pseudoprobe.perfbin			; clang truncated-pseudoprobe.o -o truncated-pseudoprobe.perfbin

llvm/tools/llvm-profgen/ProfileGenerator.cpp

Show First 20 Lines • Show All 549 Lines • ▼ Show 20 Lines	if (Probe->isEntry()) {
CallerProfile.addBodySamples(CallerIndex, 0, Count);		CallerProfile.addBodySamples(CallerIndex, 0, Count);
CallerProfile.addTotalSamples(Count);		CallerProfile.addTotalSamples(Count);
CallerProfile.addCalledTargetSamples(		CallerProfile.addCalledTargetSamples(
CallerIndex, 0,		CallerIndex, 0,
FunctionProfile.getContext().getNameWithoutContext(), Count);		FunctionProfile.getContext().getNameWithoutContext(), Count);
}		}
}		}

// Report dangling probes for frames that have real samples collected.		// Assign zero count for remaining probes without sample hits to
// Dangling probes are the probes associated to an empty block. With this		// differentiate from probes optimized away, of which the counts are unknown
// place holder, sample count on a dangling probe will not be trusted by the		// and will be inferred by the compiler.
		wenleiUnsubmitted Not Done Reply Inline Actions Mainly for saving profile size can be a misleading comment. The main benefits we see are: 1) better profile quality when we default all missing probe to be unknown (vs previously we only treat marked probes as unknown), since we have more unknown probes than dead probes. 2) allowing probe with count to take precedence over dangling ones when merging. If doing this regress profile quality, we probably won't do it even if it leads to smaller profile size. If we go further down this route, we may end up removing `InvalidProbeCount` altogether, then saving profile size can be confusing to others as others wouldn't know where we came from. wenlei: Mainly for saving profile size can be a misleading comment. The main benefits we see are: 1)…
		hoyAuthorUnsubmitted Done Reply Inline Actions Sounds good. Will update comment. hoy: Sounds good. Will update comment.
		wenleiUnsubmitted Not Done Reply Inline Actions If you mention dangling in the comments, make sure it will be updated when we remove the concept of dangling altogether. What I actually meant is that we don't need to explain the benefit comparing to the old approach since the old approach will be completely gone. Just comment it as if we arrive at this from a clean slate, in which case there should be no place for dangling probe. The extra context is good for commit message, but it's not so relevant for comment. Here we're simply marking probes that don't have any sample hits with zero count so we can differentiate probe with known count from unknown (deleted or other reason). wenlei: If you mention dangling in the comments, make sure it will be updated when we remove the…
		hoyAuthorUnsubmitted Done Reply Inline Actions I can remove the dangling term now, but feel that we need another term for the missing probes, or to mention why zero is reported. Maybe something like "Reporting zero for non-executed probes. The compiler will infer counts for deleted probes"? hoy: I can remove the dangling term now, but feel that we need another term for the missing probes…
		wenleiUnsubmitted Not Done Reply Inline Actions Yeah something like "Explicitly assign zero count for remaining probes without sample hits. This is to differentiate from removed probes whose count is unknown and will be inferred in the compiler". Missing probe is probe optimized away, it should be intuitive. Not sure if we need a special term for that.. One thing is the removal of the term about dangling; the other is we don't need to explain all the benefit when the base will be gone. wenlei: Yeah something like "Explicitly assign zero count for remaining probes without sample hits.
// compiler and we will rely on the counts inference algorithm to get the
// probe a reasonable count. Use InvalidProbeCount to mark sample count for
// a dangling probe.
for (auto &I : FrameSamples) {		for (auto &I : FrameSamples) {
auto *FunctionProfile = I.second;		auto *FunctionProfile = I.second;
for (auto *Probe : I.first->getProbes()) {		for (auto *Probe : I.first->getProbes()) {
if (Probe->isDangling()) {		if (!Probe->isDangling())
FunctionProfile->addBodySamplesForProbe(		FunctionProfile->addBodySamplesForProbe(Probe->Index, 0);
		wenleiUnsubmitted Not Done Reply Inline Actions Not that we don't do anything for dangling probes, can we remove the `moveAndDanglePseudoProbes` from compiler too? These will be missing, and not having `0` without any special handling. Furthermore, this is the only place we generate `InvalidProbeCount`, with this change, all the special casing for `InvalidProbeCount` can be removed too? wenlei: Not that we don't do anything for dangling probes, can we remove the…
		hoyAuthorUnsubmitted Done Reply Inline Actions Yes, planning to do it separately. There was the original patch that introduced dangling probe. Hopefully we can just revert it. What do you think? hoy: Yes, planning to do it separately. There was the original patch that introduced dangling probe.
Probe->Index, FunctionSamples::InvalidProbeCount);
}
}		}
}		}
}		}
}		}

void PseudoProbeCSProfileGenerator::populateBoundarySamplesWithProbes(		void PseudoProbeCSProfileGenerator::populateBoundarySamplesWithProbes(
const BranchSample &BranchCounter,		const BranchSample &BranchCounter,
SmallVectorImpl<std::string> &ContextStrStack, ProfiledBinary *Binary) {		SmallVectorImpl<std::string> &ContextStrStack, ProfiledBinary *Binary) {
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Report zero-count probe in profile instead of dangling probes.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 352496

llvm/include/llvm/ProfileData/SampleProf.h

llvm/lib/ProfileData/ProfileSummaryBuilder.cpp

llvm/lib/ProfileData/SampleProf.cpp

llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-inline.prof

llvm/test/tools/llvm-profgen/fname-canonicalization.test

llvm/test/tools/llvm-profgen/inline-cs-dangling-pseudoprobe.test

llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test

llvm/test/tools/llvm-profgen/merge-cold-profile.test

llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test

llvm/test/tools/llvm-profgen/truncated-pseudoprobe.test

llvm/tools/llvm-profgen/ProfileGenerator.cpp

[CSSPGO] Report zero-count probe in profile instead of dangling probes.
ClosedPublic