This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/include/llvm/ProfileData/
-
include/
-
llvm/
-
ProfileData/
1/1
SampleProf.h

Differential D158689

[llvm-profdata] Use llvm::DenseMap in SampleProfileMap
Needs ReviewPublic

Authored by huangjd on Aug 23 2023, 6:22 PM.

Download Raw Diff

Details

Reviewers

davidxl
snehasish
hoy
wenlei
ayermolo

Summary

Changing the container type for SampleProfileMap from std::unordered_map to llvm::DenseMap. This brings up to 8% speed up (31.4s vs 29.0s) when reading a large test profile, and 5% speedup (0.82s vs 0.78s) when reading the function offset table alone.

Since DenseMap does not guarantee reference validity after insertion (when rehash happens), code that use a reference to SampleProfiles while performing insertion to the map must be modified to make sure not to have dangling reference.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

huangjd created this revision.Aug 23 2023, 6:22 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2023, 6:22 PM

Herald added a subscriber: wenlei. · View Herald Transcript

huangjd requested review of this revision.Aug 23 2023, 6:22 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2023, 6:22 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

huangjd updated this revision to Diff 552948.Aug 23 2023, 6:48 PM

This comment was removed by huangjd.

huangjd retitled this revision from Fix dangling reference after D147740 to [llvm-profdata] Fix dangling reference after D147740.Aug 23 2023, 6:52 PM

huangjd edited the summary of this revision. (Show Details)

huangjd added reviewers: davidxl, snehasish, hoy, wenlei, ayermolo.

huangjd added a parent revision: D147740: [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map.

huangjd mentioned this in D147740: [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map.

Harbormaster completed remote builds in B254511: Diff 552948.Aug 23 2023, 7:53 PM

Fix a dangling reference bug in ProfileConverter::flattenNestedProfile . This is an existing bug only revealed after D147740 changes the profile map container type.

Actually this is not true. Before D147740, SampleProfileMap is a unordered_map, which guarantees reference stability. See the follow statement from https://cplusplus.com/reference/unordered_map/unordered_map/rehash/

If a rehash happens, all iterators are invalidated, but references and pointers to individual elements remain valid. If no actual rehash happens, no changes.

Implementations are depending on reference stability of SampleProfileMap to simplify things. And your change in D147740 broke that. As evident by the various breakages and reverts, such change is dangerous and can be impactful. I suggest you restore reference stability for SampleProfileMap, otherwise they may be more breakages from your change yet to be uncovered.

Would HashKeyMap<unordered_map, SampleContext, FunctionSamples> be enough to restore reference stability?

wenlei requested changes to this revision.Aug 23 2023, 9:57 PM

This revision now requires changes to proceed.Aug 23 2023, 9:57 PM

This does seem to fix memory corruption.
That being said I think I am with @wenlei on this. The implementation seems fragile as DenseMap, apparently, doesn't offer reference stability. Which leads to these kind of workarounds, and more then likely more issues down the road.

I think https://reviews.llvm.org/D147740 should be reverted and concerns addressed in that patch next re-land.

DenseMap is significantly faster than std::unordered_map as previously used.

Benchmarking with an internal profile for production, when reading the function offset table alone, the original implementation (before D147740), DenseMap, and unordered_map takes 1.2, 0.78, 0.82 s respectively, and reading the full profiles takes 34.6, 29.0, 31.4 s respectively. That means using a std::unordered_map is 5-8% slower. (Meanwhile the refactoring itself has been very significant regardless of which container to use)

In D158689#4615531, @huangjd wrote:

DenseMap is significantly faster than std::unordered_map as previously used.

Benchmarking with an internal profile for production, when reading the function offset table alone, the original implementation (before D147740), DenseMap, and unordered_map takes 1.2, 0.78, 0.82 s respectively, and reading the full profiles takes 34.6, 29.0, 31.4 s respectively. That means using a std::unordered_map is 5-8% slower. (Meanwhile the refactoring itself has been very significant regardless of which container to use)

Regardless of how fast DenseMap is, correctness and robustness are higher priority. Evidently the change breaks reference stability used with the implementation and has caused a lot of breakages. I'm not confident this is the last fixed needed.

I'm fine with HashKeyMap<unordered_map, SampleContext, FunctionSamples>, but I see not providing reference stability as fragile. Spraying the code with pointer/reference updates here and there is not only fragile but also hacky IMO.

In D158689#4615531, @huangjd wrote:

DenseMap is significantly faster than std::unordered_map as previously used.

Benchmarking with an internal profile for production, when reading the function offset table alone, the original implementation (before D147740), DenseMap, and unordered_map takes 1.2, 0.78, 0.82 s respectively, and reading the full profiles takes 34.6, 29.0, 31.4 s respectively. That means using a std::unordered_map is 5-8% slower. (Meanwhile the refactoring itself has been very significant regardless of which container to use)

IMHO a more robust solution wins over marginal performance gains. The unordered_map provides fundamental guarantees and doesn't require future code owners to deal with how/when insert into SampleProfileMap.

What is the root cause of unordered_map being slower?

In D158689#4615675, @davidxl wrote:

What is the root cause of unordered_map being slower?

I think it is just the implementation in general, their algorithms are quite different.

In D158689#4615855, @huangjd wrote:

In D158689#4615675, @davidxl wrote:

What is the root cause of unordered_map being slower?

I think it is just the implementation in general, their algorithms are quite different.

The perf difference is likely directly related to reference stability. unordered_map stores pointers to values in the buckets, whereas DenseMap store the values directly in the hash buckets. DenseMap has one less indirection and better locality, at the cost of reference stability - rehash would lead to reallocation for values.

Sometimes it's okay to give up reference stability for performance, but as the container actually owning all FunctionSamples, not providing reference stability means anyone holding a reference or pointer to any FunctionSamples is prone to dangling reference issues.

I have slightly different opinion on this -- I am on the camp for better performance over a small loss of convenience.

This is no different from iterator stability (lack of) for vector type when the loop adds new elements. We don't disallow use of vector because this. This should be dealt with the right coding idiom (for DenseMap) and proper documutation (on all loops that insert new samples). Once we have good baseline, the maintenance overhead in the future will be small.

In D158689#4616041, @davidxl wrote:

I have slightly different opinion on this -- I am on the camp for better performance over a small loss of convenience.

This is no different from iterator stability (lack of) for vector type when the loop adds new elements.

It is different. iterators tend to be local and have short lifetime, so it's much more manageable for a reason.

In fact, standard only requires reference stability but not iterator stability for unordered_map.

Quote from standardese: "If a rehash happens, all iterators are invalidated, but references and pointers to individual elements remain valid."

We don't disallow use of vector because this. This should be dealt with the right coding idiom (for DenseMap) and proper documutation (on all loops that insert new samples). Once we have good baseline, the maintenance overhead in the future will be small.

It it less about the container itself, but how it is used. Here the container is used to own low level data, with long lifetime, which is why reference stability is useful.

That said, even if we still want to attempt the change to use DenseMap, I suggest we try that in a separate patch, so the majority of the original changes can get in and be stabilized first, and others can be unblocked now without needing to revert the whole patch. Evidently attempts to remove reference stability is non-trivial and also needs more testing, which warrants a change of its own.

I'd also add that robustness aside, we'd at least need to adapt current implementation cleanly. I don't think we can go with DenseMap if we have to make changes like https://reviews.llvm.org/D157061?id=547068 that patches pointers. OTOH, something like final version there is more acceptable: https://reviews.llvm.org/D157061?id=550866

In D158689#4617140, @wenlei wrote:

In D158689#4616041, @davidxl wrote:

I have slightly different opinion on this -- I am on the camp for better performance over a small loss of convenience.

This is no different from iterator stability (lack of) for vector type when the loop adds new elements.

It is different. iterators tend to be local and have short lifetime, so it's much more manageable for a reason.

In fact, standard only requires reference stability but not iterator stability for unordered_map.

Quote from standardese: "If a rehash happens, all iterators are invalidated, but references and pointers to individual elements remain valid."

True for unordered_map, but what I meant is that iterator/reference stability is not always essential to the algorithm -- and we should not be afraid of using containers without such guarantees.

We don't disallow use of vector because this. This should be dealt with the right coding idiom (for DenseMap) and proper documutation (on all loops that insert new samples). Once we have good baseline, the maintenance overhead in the future will be small.

It it less about the container itself, but how it is used. Here the container is used to own low level data, with long lifetime, which is why reference stability is useful.

Owning the data is probably not the problem but exposing the pointer/reference to them and keep them for a long life time (in particular, across changes that change the size of the container). How pervasive are those use cases?

@davidxl While discussion continues can the patch be reverted, or at least changed to unordered_map for now (preferably the former). It has been impacting our internal testing of trunk for a week now.

@wenlei, may I propose having this patch in first (to unblock) @ayermolo while we are discussing longer term strategy? Another benefit is that we can wait to see if there are other incidence (of using long lifetime reference) can show up so that we have more cases to consider the redesign (if needed).

In D158689#4617548, @davidxl wrote:

@wenlei, may I propose having this patch in first (to unblock) @ayermolo while we are discussing longer term strategy? Another benefit is that we can wait to see if there are other incidence (of using long lifetime reference) can show up so that we have more cases to consider the redesign (if needed).

Back-and-forth reverts/relands are costly and somewhat disruptive, I hope we can gain more confidence before reattempting. Here's my suggestion:

Step 1: use HashKeyMap<unordered_map, SampleContext, FunctionSamples> instead of HashKeyMap<DenseMap, SampleContext, FunctionSamples> for SampleProfileMap. It's a one line change to unblock.

Step 2: a separate patch to attempt DenseMap, coupled with all fixes needed. we can also help test correctness and performance on that change. actually the performance picture may not be that clear. reallocation is expensive, and the fact you didn't run into reallocation in your testing may indicate that the perf benchmark result may also be skewed. We also need to take into account the cost to update references etc.

WDYT?

The proposal sounds good to me.

In D158689#4617634, @wenlei wrote:

In D158689#4617548, @davidxl wrote:

@wenlei, may I propose having this patch in first (to unblock) @ayermolo while we are discussing longer term strategy? Another benefit is that we can wait to see if there are other incidence (of using long lifetime reference) can show up so that we have more cases to consider the redesign (if needed).

Back-and-forth reverts/relands are costly and somewhat disruptive, I hope we can gain more confidence before reattempting. Here's my suggestion:

Step 1: use HashKeyMap<unordered_map, SampleContext, FunctionSamples> instead of HashKeyMap<DenseMap, SampleContext, FunctionSamples> for SampleProfileMap. It's a one line change to unblock.

Step 2: a separate patch to attempt DenseMap, coupled with all fixes needed. we can also help test correctness and performance on that change. actually the performance picture may not be that clear. reallocation is expensive, and the fact you didn't run into reallocation in your testing may indicate that the perf benchmark result may also be skewed. We also need to take into account the cost to update references etc.

WDYT?

In D158689#4617692, @davidxl wrote:

The proposal sounds good to me.

In D158689#4617634, @wenlei wrote:

In D158689#4617548, @davidxl wrote:

@wenlei, may I propose having this patch in first (to unblock) @ayermolo while we are discussing longer term strategy? Another benefit is that we can wait to see if there are other incidence (of using long lifetime reference) can show up so that we have more cases to consider the redesign (if needed).

Back-and-forth reverts/relands are costly and somewhat disruptive, I hope we can gain more confidence before reattempting. Here's my suggestion:

Step 1: use HashKeyMap<unordered_map, SampleContext, FunctionSamples> instead of HashKeyMap<DenseMap, SampleContext, FunctionSamples> for SampleProfileMap. It's a one line change to unblock.

Step 2: a separate patch to attempt DenseMap, coupled with all fixes needed. we can also help test correctness and performance on that change. actually the performance picture may not be that clear. reallocation is expensive, and the fact you didn't run into reallocation in your testing may indicate that the perf benchmark result may also be skewed. We also need to take into account the cost to update references etc.

WDYT?

Great. I would be happy to test it in our environment with bigger profiles.

@huangjd can you follow up and change to unordered_map, so whole diff doesn't have to be reverted?

@huangjd Do you think you will have time to change implementation to unordered_map as discussed by EOD, otherwise I am reverting the diff to unblock out validation testing.

In D158689#4621293, @ayermolo wrote:

@huangjd Do you think you will have time to change implementation to unordered_map as discussed by EOD, otherwise I am reverting the diff to unblock out validation testing.

D159014

Update patch to use llvm::DenseMap.
Using postorder traversal when flattening the profile. In this case there is no need to lookup for the inserted profile again after its callee profiles are flattened. The computational complexity is the same (no additional operations added)

Harbormaster completed remote builds in B255362: Diff 554123.Aug 28 2023, 5:38 PM

huangjd retitled this revision from [llvm-profdata] Fix dangling reference after D147740 to [llvm-profdata] Use llvm::DenseMap in SampleProfileMap.Aug 28 2023, 5:43 PM

huangjd edited the summary of this revision. (Show Details)

nit: remove extra new line

Harbormaster completed remote builds in B255364: Diff 554125.Aug 28 2023, 5:49 PM

huangjd added inline comments.Aug 28 2023, 5:51 PM

llvm/include/llvm/ProfileData/SampleProf.h
1575	Since flatten does not change the original sample profile (const reference), applying the same logic to each profile, it should be traversal-order agnostic. Processing the profiles in any order should yield the same results

We did some measurement internally on DenseMap vs. std::unordered_map.

The test was setup on a dedicated devserver as follows:

Pick the top 100 slow compilations from a major internal build target.
The profile (csspgo) size is 774MB including 170K profiled functions and 80M samples.
Run every compilation using either version for 5 runs and calculate the average.
Report both instruction count (very low variance) and elapsed wall clock time (relatively high variance).

The result

0.02% less instructions from the unordered_map version.
1.2% less wall clock time from the unordered_map version.

We didn't measure the time spent within sample profile loader though, the above numbers are all e2e compilations.

0.02% less instructions from the unordered_map version.

1.2% less wall clock time from the unordered_map version.

We didn't measure the time spent within sample profile loader though, the above numbers are all e2e compilations.

Thanks for the measurement. This result doesn't seem to justify all the trouble to remove reference stability. The benefit of DenseMap could be offset by the cost of rehashing.

Pick the top 100 slow compilations from a major internal build target.

Among the 100 compilations measured, is there anything that shows a meaningful improvement from DenseMap?

In D158689#4628952, @wenlei wrote:

Pick the top 100 slow compilations from a major internal build target.

Among the 100 compilations measured, is there anything that shows a meaningful improvement from DenseMap?

Not really. For instruction count, the biggest win from DenseMap is 0.14%, and unordered_map also wins by 0.17% in one case. For wall clock time, there is a bigger swing due to bigger variance. DenseMap wins as much as 4.3%, but unordered_map also shows comparable wins in other cases.

Interesting. @williamjhuang can you double check your measurement ? If the improvement can be reproduced, do some profiling and understand where the performance come from is useful.

@weiwang Have you tried to generate a flame graph showing how much time each function takes?

In D158689#4629032, @davidxl wrote:

Interesting. @williamjhuang can you double check your measurement ? If the improvement can be reproduced, do some profiling and understand where the performance come from is useful.

I measured the time on reading the function offset table section and reading the entire profile, not the compilation time. The other steps in the compilation pipeline are expected to take up a much longer time than reading the profile (actually only the function table is always being read, the samples are read only if they match some functions in the source code), so we need to have a common ground on what's being measured.

The objective of this series of patch is to improve the profile reader (of course need to make sure the overall compilation time is improved as well).

In D158689#4629243, @huangjd wrote:

In D158689#4629032, @davidxl wrote:

Interesting. @williamjhuang can you double check your measurement ? If the improvement can be reproduced, do some profiling and understand where the performance come from is useful.

I measured the time on reading the function offset table section and reading the entire profile, not the compilation time. The other steps in the compilation pipeline are expected to take up a much longer time than reading the profile (actually only the function table is always being read, the samples are read only if they match some functions in the source code), so we need to have a common ground on what's being measured.

The objective of this series of patch is to improve the profile reader (of course need to make sure the overall compilation time is improved as well).

The e2e compilation should still see reduced instructions and cycles. Can you collect e2e data for more compilations (and also report profile size)? While the slowest compilation determines the build latency, the goal here is to save build pool's utilization, so you don't have to pick the slowest compilations for measurement.

In D158689#4629229, @huangjd wrote:

@weiwang Have you tried to generate a flame graph showing how much time each function takes?

I didn't collect this for the measurement. I think this change is pretty isolated to the loader pass, so if there is a win, it would show up in the e2e compilation.

I'll take a few cases and report back.

In D158689#4631799, @weiwang wrote:

In D158689#4629229, @huangjd wrote:

@weiwang Have you tried to generate a flame graph showing how much time each function takes?

I didn't collect this for the measurement. I think this change is pretty isolated to the loader pass, so if there is a win, it would show up in the e2e compilation.

I'll take a few cases and report back.

and at the end of the day e2e is what we care about, no?

This brings up to 8% speed up (31.4s vs 29.0s) when reading a large test profile, and 5% speedup (0.82s vs 0.78s) when reading the function offset table alone.

@huangjd what were you measuring exactly? is that the total wall clock / cycles for SampleProfileLoader for all modules, or were you just measuring SampleProfileReader time? From the line above, it sounds like you were measuring the latter. However, the impact of changing a container is not limited the creation of that container.

So measuring SampleProfileLoader would give us a more accurate picture of overall impact, including container construction (which is SampleProfileReader), updating and consumption (which are other parts of SampleProfileLoader).

OTOH, if the impact on SampleProfileLoader is significant, we should be able to see a difference in e2e compilation. I agree that in the end e2e compilation is what matters.

In D158689#4631824, @wenlei wrote:

This brings up to 8% speed up (31.4s vs 29.0s) when reading a large test profile, and 5% speedup (0.82s vs 0.78s) when reading the function offset table alone.

@huangjd what were you measuring exactly? is that the total wall clock / cycles for SampleProfileLoader for all modules, or were you just measuring SampleProfileReader time? From the line above, it sounds like you were measuring the latter. However, the impact of changing a container is not limited the creation of that container.

So measuring SampleProfileLoader would give us a more accurate picture of overall impact, including container construction (which is SampleProfileReader), updating and consumption (which are other parts of SampleProfileLoader).

OTOH, if the impact on SampleProfileLoader is significant, we should be able to see a difference in e2e compilation. I agree that in the end e2e compilation is what matters.

true.

However, when picking the compilations to examine, the slowest ones may not be the good candidate -- as they are likely triggering pathological compile time issue elsewhere thus makes profile loading less significant.

William's experiment may also have FSAFDO on, thus have more rounds of loading.

Revision Contents

Path

Size

llvm/

include/

llvm/

ProfileData/

SampleProf.h

23 lines

Diff 554125

llvm/include/llvm/ProfileData/SampleProf.h

Show First 20 Lines • Show All 1,406 Lines • ▼ Show 20 Lines
};		};

/// This class provides operator overloads to the map container using MD5 as the		/// This class provides operator overloads to the map container using MD5 as the
/// key type, so that existing code can still work in most cases using		/// key type, so that existing code can still work in most cases using
/// SampleContext as key.		/// SampleContext as key.
/// Note: when populating container, make sure to assign the SampleContext to		/// Note: when populating container, make sure to assign the SampleContext to
/// the mapped value immediately because the key no longer holds it.		/// the mapped value immediately because the key no longer holds it.
class SampleProfileMap		class SampleProfileMap
: public HashKeyMap<std::unordered_map, SampleContext, FunctionSamples> {		: public HashKeyMap<llvm::DenseMap, SampleContext, FunctionSamples> {
public:		public:
// Convenience method because this is being used in many places. Set the		// Convenience method because this is being used in many places. Set the
// FunctionSamples' context if its newly inserted.		// FunctionSamples' context if its newly inserted.
mapped_type &Create(const SampleContext &Ctx) {		mapped_type &Create(const SampleContext &Ctx) {
auto Ret = try_emplace(Ctx, FunctionSamples());		auto Ret = try_emplace(Ctx, FunctionSamples());
if (Ret.second)		if (Ret.second)
Ret.first->second.setContext(Ctx);		Ret.first->second.setContext(Ctx);
return Ret.first->second;		return Ret.first->second;
}		}

iterator find(const SampleContext &Ctx) {		iterator find(const SampleContext &Ctx) {
return HashKeyMap<std::unordered_map, SampleContext, FunctionSamples>::find(		return HashKeyMap<llvm::DenseMap, SampleContext, FunctionSamples>::find(
Ctx);		Ctx);
}		}

const_iterator find(const SampleContext &Ctx) const {		const_iterator find(const SampleContext &Ctx) const {
return HashKeyMap<std::unordered_map, SampleContext, FunctionSamples>::find(		return HashKeyMap<llvm::DenseMap, SampleContext, FunctionSamples>::find(
Ctx);		Ctx);
}		}

// Overloaded find() to lookup a function by name. This is called by IPO		// Overloaded find() to lookup a function by name. This is called by IPO
// passes with an actual function name, and it is possible that the profile		// passes with an actual function name, and it is possible that the profile
// reader converted function names in the profile to MD5 strings, so we need		// reader converted function names in the profile to MD5 strings, so we need
// to check if either representation matches.		// to check if either representation matches.
iterator find(StringRef Fname) {		iterator find(StringRef Fname) {
uint64_t Hash = hashFuncName(Fname);		uint64_t Hash = hashFuncName(Fname);
auto It = base_type::find(hash_code(Hash));		auto It = base_type::find(hash_code(Hash));
if (It != end()) {		if (It != end()) {
StringRef CtxName = It->second.getContext().getName();		StringRef CtxName = It->second.getContext().getName();
if (LLVM_LIKELY(CtxName == Fname \|\| CtxName == std::to_string(Hash)))		if (LLVM_LIKELY(CtxName == Fname \|\| CtxName == std::to_string(Hash)))
return It;		return It;
}		}
return end();		return end();
}		}

size_t erase(const SampleContext &Ctx) {		size_t erase(const SampleContext &Ctx) {
return HashKeyMap<std::unordered_map, SampleContext, FunctionSamples>::		return HashKeyMap<llvm::DenseMap, SampleContext, FunctionSamples>::erase(
erase(Ctx);		Ctx);
}		}

size_t erase(const key_type &Key) { return base_type::erase(Key); }		size_t erase(const key_type &Key) { return base_type::erase(Key); }
};		};

using NameFunctionSamples = std::pair<hash_code, const FunctionSamples *>;		using NameFunctionSamples = std::pair<hash_code, const FunctionSamples *>;

void sortFuncProfiles(const SampleProfileMap &ProfileMap,		void sortFuncProfiles(const SampleProfileMap &ProfileMap,
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	if (ProfileIsCS) {
for (const auto &I : InputProfiles)		for (const auto &I : InputProfiles)
flattenNestedProfile(OutputProfiles, I.second);		flattenNestedProfile(OutputProfiles, I.second);
}		}
}		}

private:		private:
static void flattenNestedProfile(SampleProfileMap &OutputProfiles,		static void flattenNestedProfile(SampleProfileMap &OutputProfiles,
const FunctionSamples &FS) {		const FunctionSamples &FS) {
		// Use postorder traversal here so that the reference to the newly added
		// profile remains valid for duration of the function, even if the output
		// profile map relocates.
		for (const auto &I : FS.getCallsiteSamples()) {
		for (const auto &Callee : I.second) {
		const auto &CalleeProfile = Callee.second;
		// Recursively convert callee profile.
		flattenNestedProfile(OutputProfiles, CalleeProfile);
		}
		}

		huangjdAuthorUnsubmitted Done Reply Inline Actions Since flatten does not change the original sample profile (const reference), applying the same logic to each profile, it should be traversal-order agnostic. Processing the profiles in any order should yield the same results huangjd: Since flatten does not change the original sample profile (const reference), applying the same…
// To retain the context, checksum, attributes of the original profile, make		// To retain the context, checksum, attributes of the original profile, make
// a copy of it if no profile is found.		// a copy of it if no profile is found.
SampleContext &Context = FS.getContext();		SampleContext &Context = FS.getContext();
auto Ret = OutputProfiles.try_emplace(Context, FS);		auto Ret = OutputProfiles.try_emplace(Context, FS);
FunctionSamples &Profile = Ret.first->second;		FunctionSamples &Profile = Ret.first->second;
if (Ret.second) {		if (Ret.second) {
// Clear nested inlinees' samples for the flattened copy. These inlinees		// Clear nested inlinees' samples for the flattened copy. These inlinees
// will have their own top-level entries after flattening.		// will have their own top-level entries after flattening.
Show All 25 Lines	for (const auto &I : FS.getCallsiteSamples()) {
Profile.addCalledTargetSamples(		Profile.addCalledTargetSamples(
I.first.LineOffset, I.first.Discriminator, CalleeProfile.getName(),		I.first.LineOffset, I.first.Discriminator, CalleeProfile.getName(),
CalleeProfile.getHeadSamplesEstimate());		CalleeProfile.getHeadSamplesEstimate());
// Update total samples.		// Update total samples.
TotalSamples = TotalSamples >= CalleeProfile.getTotalSamples()		TotalSamples = TotalSamples >= CalleeProfile.getTotalSamples()
? TotalSamples - CalleeProfile.getTotalSamples()		? TotalSamples - CalleeProfile.getTotalSamples()
: 0;		: 0;
TotalSamples += CalleeProfile.getHeadSamplesEstimate();		TotalSamples += CalleeProfile.getHeadSamplesEstimate();
// Recursively convert callee profile.
flattenNestedProfile(OutputProfiles, CalleeProfile);
}		}
}		}
Profile.addTotalSamples(TotalSamples);		Profile.addTotalSamples(TotalSamples);

Profile.setHeadSamples(Profile.getHeadSamplesEstimate());		Profile.setHeadSamples(Profile.getHeadSamplesEstimate());
}		}

// Nest all children profiles into the profile of Node.		// Nest all children profiles into the profile of Node.
▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines