This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/tools/llvm-profgen/
-
tools/
-
llvm-profgen/
8/14
ProfileGenerator.cpp

Differential D107529

[llvm-profgen] Fix bug of loop scope mismatch
ClosedPublic

Authored by wlei on Aug 4 2021, 8:53 PM.

Download Raw Diff

Details

Reviewers

hoy
wenlei
wmi

Commits

rGa8a38ef3d99c: [llvm-profgen] Fix bug of loop scope mismatch

Summary

One performance issue happened in profile generation and it turned out the line 525 loop is the bottleneck.
Moving the code outside of loop scope can fix this issue. The run time is improved from 30+mins to ~30s.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

wlei created this revision.Aug 4 2021, 8:53 PM

Herald added subscribers: hoy, wenlei, lxfind. · View Herald TranscriptAug 4 2021, 8:53 PM

wlei requested review of this revision.Aug 4 2021, 8:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2021, 8:53 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

wlei edited the summary of this revision. (Show Details)Aug 4 2021, 9:00 PM

wlei added reviewers: hoy, wenlei, wmi.

Good catch! Is this cause of the slow down we saw today and is this a regression?

This revision is now accepted and ready to land.Aug 4 2021, 9:05 PM

wlei added inline comments.Aug 4 2021, 9:06 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525	I'm wondering here if `FunctionSamples ` should be `Vector<FunctionSamples >` as same frame can point to different FunctionSamples?

In D107529#2927429, @wenlei wrote:

Good catch! Is this cause of the slow down we saw today and is this a regression?

Yes, it's today's slowing down and regression(it should also improve other cases), I reproduced @spupyrev 's script and saw 99% of the time spent is in line 525 loop and the loop size is 100,000+.

Harbormaster completed remote builds in B118065: Diff 364336.Aug 4 2021, 9:21 PM

Good catch, thanks for the fix!

llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525	An inline context under a given calling context should uniquely identify an inline frame, therefore can only point to one leaf profile. Am I missing anything?

hoy accepted this revision.Aug 4 2021, 9:24 PM

wlei added inline comments.Aug 4 2021, 9:34 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525	Sorry I meant `std::set<FunctionSamples *>`. I'm thinking the multiple probes cases from different call stack and they have the same leaf probe. like [Probe1, Probe3] and [Probe1, Probe2] Here Probe1 will have two `FunctionSamples`.

wlei added inline comments.Aug 4 2021, 9:40 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525	Oh. Here in `populateBodySamplesWithProbes` has already guaranteed to have unique call stack, so no multiple probe cases. Sorry for the confusing.

Hmm, I compared this patch against previous output, it's not identical. I still need some time digging it before land this.

change to std::unordered_set<FunctionSamples *> instead of FunctionSamples *

wlei added inline comments.Aug 5 2021, 12:36 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525	@hoy I found that the `probe->getInlineTreeNode()` is not the leaf frame so here it's not unique, the unique one should be like "probe + leaf function" and the leaf function is given from its FuncDesc. It happened that there are different the leaf functions. See the reference below, "Note that the context from probe doesn't include leaf frame, hence we need to retrieve and prepend leaf if requested.", when we extract the inline context, we still need to append the leaf frame. void MCPseudoProbeDecoder::getInlineContextForProbe( const MCDecodedPseudoProbe Probe, SmallVectorImpl<std::string> &InlineContextStack, bool IncludeLeaf) const { Probe->getInlineContext(InlineContextStack, GUID2FuncDescMap, true); if (!IncludeLeaf) return; // Note that the context from probe doesn't include leaf frame, // hence we need to retrieve and prepend leaf if requested. const auto FuncDesc = getFuncDescForGUID(Probe->getGuid()); InlineContextStack.emplace_back(FuncDesc->FuncName + ":" + Twine(Probe->getIndex()).str()); } So I changed to use `unordered_set` for multiple leaf frame FunctionSample. See if this looks good to you?

hoy added inline comments.Aug 5 2021, 1:02 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525	I see, a callsite can lead to different leaf frames, if it is an indirect callsite. That's a good find, thanks! Yeah, using `unordered_set` sounds good to me. Does it close the gap you saw with the loop scope change?

Harbormaster completed remote builds in B118235: Diff 364578.Aug 5 2021, 1:05 PM

wlei added inline comments.Aug 5 2021, 1:48 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525	Yes, Now the output of this change is identical to the one without this change while building the `clang` script

hoy added inline comments.Aug 5 2021, 1:51 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525	That's nice. Wondering what could cause same inline context to point to different frames. For the clang pass1 build, ICP shouldn't be triggered?

wlei added inline comments.Aug 5 2021, 2:13 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525	My understanding is the same frame can point to different inline contexts, same inline context should point to same frame. e.g. probe1(GUID:foo) and probe2(GUID:bar) can point to the same frame(`inlinetree`) and when the `inlinetree`'s context is extracted like `main @ goo` and there full context will be probe1 : `main @ goo @ foo` probe2: `main @ goo @ bar`.

hoy added inline comments.Aug 5 2021, 3:07 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525	In your example, does `foo` and `bar` share the same callsite in `goo`?

wlei added inline comments.Aug 5 2021, 3:20 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525	Yes, that's why here changed to unordered_set, and as you mentioned, foo and bar is indirect call

hoy added inline comments.Aug 5 2021, 3:28 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525	I was wondering for clang pass1 build, how was the indirect call promoted. We don't use profile for pass1 thus ICP should not be triggered?

hoy added inline comments.Aug 5 2021, 4:20 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525	I guess two probes with inline contexts can be merged, and the merged inline context is trimmed to empty of a common part of the inputs. Anyway, that's unrelated to this patch. Using an unordered_set sounds good to me.

wlei added inline comments.Aug 5 2021, 4:27 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525	Thanks for the discussion! That's interesting question, I'm trying to find something in the `clang` asm, but it's too large to debug. I will land this first and later try to run on SPEC to see if something easy to catch.

wlei edited the summary of this revision. (Show Details)Aug 5 2021, 4:31 PM

Closed by commit rGa8a38ef3d99c: [llvm-profgen] Fix bug of loop scope mismatch (authored by wlei). · Explain WhyAug 5 2021, 4:53 PM

This revision was automatically updated to reflect the committed changes.

wlei added a commit: rGa8a38ef3d99c: [llvm-profgen] Fix bug of loop scope mismatch.

Revision Contents

Path

Size

llvm/

tools/

llvm-profgen/

ProfileGenerator.cpp

17 lines

Diff 364655

llvm/tools/llvm-profgen/ProfileGenerator.cpp

//===-- ProfileGenerator.cpp - Profile Generator ---------------- C++ --===//		//===-- ProfileGenerator.cpp - Profile Generator ---------------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "ProfileGenerator.h"		#include "ProfileGenerator.h"
#include "llvm/ProfileData/ProfileCommon.h"		#include "llvm/ProfileData/ProfileCommon.h"
		#include <unordered_set>

static cl::opt<std::string> OutputFilename("output", cl::value_desc("output"),		static cl::opt<std::string> OutputFilename("output", cl::value_desc("output"),
cl::Required,		cl::Required,
cl::desc("Output profile file"));		cl::desc("Output profile file"));
static cl::alias OutputA("o", cl::desc("Alias for --output"),		static cl::alias OutputA("o", cl::desc("Alias for --output"),
cl::aliasopt(OutputFilename));		cl::aliasopt(OutputFilename));

static cl::opt<SampleProfileFormat> OutputFormat(		static cl::opt<SampleProfileFormat> OutputFormat(
▲ Show 20 Lines • Show All 496 Lines • ▼ Show 20 Lines

void PseudoProbeCSProfileGenerator::populateBodySamplesWithProbes(		void PseudoProbeCSProfileGenerator::populateBodySamplesWithProbes(
const RangeSample &RangeCounter,		const RangeSample &RangeCounter,
SmallVectorImpl<std::string> &ContextStrStack, ProfiledBinary *Binary) {		SmallVectorImpl<std::string> &ContextStrStack, ProfiledBinary *Binary) {
ProbeCounterMap ProbeCounter;		ProbeCounterMap ProbeCounter;
// Extract the top frame probes by looking up each address among the range in		// Extract the top frame probes by looking up each address among the range in
// the Address2ProbeMap		// the Address2ProbeMap
extractProbesFromRange(RangeCounter, ProbeCounter, Binary);		extractProbesFromRange(RangeCounter, ProbeCounter, Binary);
std::unordered_map<MCDecodedPseudoProbeInlineTree , FunctionSamples >		std::unordered_map<MCDecodedPseudoProbeInlineTree *,
		std::unordered_set<FunctionSamples *>>
		wleiAuthorUnsubmitted Done Reply Inline Actions I'm wondering here if `FunctionSamples ` should be `Vector<FunctionSamples >` as same frame can point to different FunctionSamples? wlei: I'm wondering here if `FunctionSamples ` should be `Vector<FunctionSamples >` as same frame…
		hoyUnsubmitted Not Done Reply Inline Actions An inline context under a given calling context should uniquely identify an inline frame, therefore can only point to one leaf profile. Am I missing anything? hoy: An inline context under a given calling context should uniquely identify an inline frame…
		wleiAuthorUnsubmitted Done Reply Inline Actions Sorry I meant `std::set<FunctionSamples >`. I'm thinking the multiple probes cases from different call stack and they have the same leaf probe. like [Probe1, Probe3] and [Probe1, Probe2] Here Probe1 will have two `FunctionSamples`. wlei:* Sorry I meant `std::set<FunctionSamples *>`. I'm thinking the multiple probes cases from…
		wleiAuthorUnsubmitted Done Reply Inline Actions Oh. Here in `populateBodySamplesWithProbes` has already guaranteed to have unique call stack, so no multiple probe cases. Sorry for the confusing. wlei: Oh. Here in `populateBodySamplesWithProbes` has already guaranteed to have unique call stack…
		wleiAuthorUnsubmitted Done Reply Inline Actions @hoy I found that the `probe->getInlineTreeNode()` is not the leaf frame so here it's not unique, the unique one should be like "probe + leaf function" and the leaf function is given from its FuncDesc. It happened that there are different the leaf functions. See the reference below, "Note that the context from probe doesn't include leaf frame, hence we need to retrieve and prepend leaf if requested.", when we extract the inline context, we still need to append the leaf frame. void MCPseudoProbeDecoder::getInlineContextForProbe( const MCDecodedPseudoProbe Probe, SmallVectorImpl<std::string> &InlineContextStack, bool IncludeLeaf) const { Probe->getInlineContext(InlineContextStack, GUID2FuncDescMap, true); if (!IncludeLeaf) return; // Note that the context from probe doesn't include leaf frame, // hence we need to retrieve and prepend leaf if requested. const auto FuncDesc = getFuncDescForGUID(Probe->getGuid()); InlineContextStack.emplace_back(FuncDesc->FuncName + ":" + Twine(Probe->getIndex()).str()); } So I changed to use `unordered_set` for multiple leaf frame FunctionSample. See if this looks good to you? wlei: @hoy I found that the `probe->getInlineTreeNode()` is not the leaf frame so here it's not…
		hoyUnsubmitted Not Done Reply Inline Actions I see, a callsite can lead to different leaf frames, if it is an indirect callsite. That's a good find, thanks! Yeah, using `unordered_set` sounds good to me. Does it close the gap you saw with the loop scope change? hoy: I see, a callsite can lead to different leaf frames, if it is an indirect callsite. That's a…
		wleiAuthorUnsubmitted Done Reply Inline Actions Yes, Now the output of this change is identical to the one without this change while building the `clang` script wlei: Yes, Now the output of this change is identical to the one without this change while building…
		hoyUnsubmitted Not Done Reply Inline Actions That's nice. Wondering what could cause same inline context to point to different frames. For the clang pass1 build, ICP shouldn't be triggered? hoy: That's nice. Wondering what could cause same inline context to point to different frames. For…
		wleiAuthorUnsubmitted Done Reply Inline Actions My understanding is the same frame can point to different inline contexts, same inline context should point to same frame. e.g. probe1(GUID:foo) and probe2(GUID:bar) can point to the same frame(`inlinetree`) and when the `inlinetree`'s context is extracted like `main @ goo` and there full context will be probe1 : `main @ goo @ foo` probe2: `main @ goo @ bar`. wlei: My understanding is the same frame can point to different inline contexts, same inline context…
		hoyUnsubmitted Not Done Reply Inline Actions In your example, does `foo` and `bar` share the same callsite in `goo`? hoy: In your example, does `foo` and `bar` share the same callsite in `goo`?
		wleiAuthorUnsubmitted Done Reply Inline Actions Yes, that's why here changed to unordered_set, and as you mentioned, foo and bar is indirect call wlei: Yes, that's why here changed to unordered_set, and as you mentioned, foo and bar is indirect…
		hoyUnsubmitted Not Done Reply Inline Actions I was wondering for clang pass1 build, how was the indirect call promoted. We don't use profile for pass1 thus ICP should not be triggered? hoy: I was wondering for clang pass1 build, how was the indirect call promoted. We don't use profile…
		hoyUnsubmitted Not Done Reply Inline Actions I guess two probes with inline contexts can be merged, and the merged inline context is trimmed to empty of a common part of the inputs. Anyway, that's unrelated to this patch. Using an unordered_set sounds good to me. hoy: I guess two probes with inline contexts can be merged, and the merged inline context is trimmed…
		wleiAuthorUnsubmitted Done Reply Inline Actions Thanks for the discussion! That's interesting question, I'm trying to find something in the `clang` asm, but it's too large to debug. I will land this first and later try to run on SPEC to see if something easy to catch. wlei: Thanks for the discussion! That's interesting question, I'm trying to find something in the…
FrameSamples;		FrameSamples;
for (auto PI : ProbeCounter) {		for (auto PI : ProbeCounter) {
const MCDecodedPseudoProbe *Probe = PI.first;		const MCDecodedPseudoProbe *Probe = PI.first;
uint64_t Count = PI.second;		uint64_t Count = PI.second;
FunctionSamples &FunctionProfile =		FunctionSamples &FunctionProfile =
getFunctionProfileForLeafProbe(ContextStrStack, Probe, Binary);		getFunctionProfileForLeafProbe(ContextStrStack, Probe, Binary);
// Record the current frame and FunctionProfile whenever samples are		// Record the current frame and FunctionProfile whenever samples are
// collected for non-danglie probes. This is for reporting all of the		// collected for non-danglie probes. This is for reporting all of the
// zero count probes of the frame later.		// zero count probes of the frame later.
FrameSamples[Probe->getInlineTreeNode()] = &FunctionProfile;		FrameSamples[Probe->getInlineTreeNode()].insert(&FunctionProfile);
FunctionProfile.addBodySamplesForProbe(Probe->getIndex(), Count);		FunctionProfile.addBodySamplesForProbe(Probe->getIndex(), Count);
FunctionProfile.addTotalSamples(Count);		FunctionProfile.addTotalSamples(Count);
if (Probe->isEntry()) {		if (Probe->isEntry()) {
FunctionProfile.addHeadSamples(Count);		FunctionProfile.addHeadSamples(Count);
// Look up for the caller's function profile		// Look up for the caller's function profile
const auto *InlinerDesc = Binary->getInlinerDescForProbe(Probe);		const auto *InlinerDesc = Binary->getInlinerDescForProbe(Probe);
if (InlinerDesc != nullptr) {		if (InlinerDesc != nullptr) {
// Since the context id will be compressed, we have to use callee's		// Since the context id will be compressed, we have to use callee's
Show All 12 Lines	if (Probe->isEntry()) {
CallerProfile.setFunctionHash(InlinerDesc->FuncHash);		CallerProfile.setFunctionHash(InlinerDesc->FuncHash);
CallerProfile.addBodySamples(CallerIndex, 0, Count);		CallerProfile.addBodySamples(CallerIndex, 0, Count);
CallerProfile.addTotalSamples(Count);		CallerProfile.addTotalSamples(Count);
CallerProfile.addCalledTargetSamples(		CallerProfile.addCalledTargetSamples(
CallerIndex, 0,		CallerIndex, 0,
FunctionProfile.getContext().getNameWithoutContext(), Count);		FunctionProfile.getContext().getNameWithoutContext(), Count);
}		}
}		}
		}

// Assign zero count for remaining probes without sample hits to		// Assign zero count for remaining probes without sample hits to
// differentiate from probes optimized away, of which the counts are unknown		// differentiate from probes optimized away, of which the counts are unknown
// and will be inferred by the compiler.		// and will be inferred by the compiler.
for (auto &I : FrameSamples) {		for (auto &I : FrameSamples) {
auto *FunctionProfile = I.second;		for (auto *FunctionProfile : I.second) {
for (auto *Probe : I.first->getProbes()) {		for (auto *Probe : I.first->getProbes()) {
FunctionProfile->addBodySamplesForProbe(Probe->getIndex(), 0);		FunctionProfile->addBodySamplesForProbe(Probe->getIndex(), 0);
}		}
}		}
}		}
}		}

void PseudoProbeCSProfileGenerator::populateBoundarySamplesWithProbes(		void PseudoProbeCSProfileGenerator::populateBoundarySamplesWithProbes(
▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines