This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profgen] Fix bug of loop scope mismatch
ClosedPublic

Authored by wlei on Aug 4 2021, 8:53 PM.

Details

Summary

One performance issue happened in profile generation and it turned out the line 525 loop is the bottleneck.
Moving the code outside of loop scope can fix this issue. The run time is improved from 30+mins to ~30s.

Diff Detail

Event Timeline

wlei created this revision.Aug 4 2021, 8:53 PM
wlei requested review of this revision.Aug 4 2021, 8:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2021, 8:53 PM
wlei edited the summary of this revision. (Show Details)Aug 4 2021, 9:00 PM
wlei added reviewers: hoy, wenlei, wmi.
wenlei accepted this revision.Aug 4 2021, 9:05 PM

Good catch! Is this cause of the slow down we saw today and is this a regression?

This revision is now accepted and ready to land.Aug 4 2021, 9:05 PM
wlei added inline comments.Aug 4 2021, 9:06 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525

I'm wondering here if FunctionSamples * should be Vector<FunctionSamples *> as same frame can point to different FunctionSamples?

wlei added a subscriber: spupyrev.EditedAug 4 2021, 9:11 PM

Good catch! Is this cause of the slow down we saw today and is this a regression?

Yes, it's today's slowing down and regression(it should also improve other cases), I reproduced @spupyrev 's script and saw 99% of the time spent is in line 525 loop and the loop size is 100,000+.

hoy added a comment.Aug 4 2021, 9:24 PM

Good catch, thanks for the fix!

llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525

An inline context under a given calling context should uniquely identify an inline frame, therefore can only point to one leaf profile. Am I missing anything?

hoy accepted this revision.Aug 4 2021, 9:24 PM
wlei added inline comments.Aug 4 2021, 9:34 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525

Sorry I meant std::set<FunctionSamples *>.

I'm thinking the multiple probes cases from different call stack and they have the same leaf probe. like
[Probe1, Probe3] and [Probe1, Probe2]

Here Probe1 will have two FunctionSamples.

wlei added inline comments.Aug 4 2021, 9:40 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525

Oh. Here in populateBodySamplesWithProbes has already guaranteed to have unique call stack, so no multiple probe cases. Sorry for the confusing.

wlei added a comment.Aug 5 2021, 10:09 AM

Hmm, I compared this patch against previous output, it's not identical. I still need some time digging it before land this.

wlei updated this revision to Diff 364578.Aug 5 2021, 12:27 PM

change to std::unordered_set<FunctionSamples *> instead of FunctionSamples *

wlei added inline comments.Aug 5 2021, 12:36 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525

@hoy I found that the probe->getInlineTreeNode() is not the leaf frame so here it's not unique, the unique one should be like "probe + leaf function" and the leaf function is given from its FuncDesc. It happened that there are different the leaf functions.

See the reference below, "Note that the context from probe doesn't include leaf frame, hence we need to retrieve and prepend leaf if requested.", when we extract the inline context, we still need to append the leaf frame.

void MCPseudoProbeDecoder::getInlineContextForProbe(
    const MCDecodedPseudoProbe *Probe,
    SmallVectorImpl<std::string> &InlineContextStack, bool IncludeLeaf) const {
  Probe->getInlineContext(InlineContextStack, GUID2FuncDescMap, true);
  if (!IncludeLeaf)
    return;
  // Note that the context from probe doesn't include leaf frame,
  // hence we need to retrieve and prepend leaf if requested.
  const auto *FuncDesc = getFuncDescForGUID(Probe->getGuid());
  InlineContextStack.emplace_back(FuncDesc->FuncName + ":" +
                                  Twine(Probe->getIndex()).str());
}

So I changed to use unordered_set for multiple leaf frame FunctionSample. See if this looks good to you?

hoy added inline comments.Aug 5 2021, 1:02 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525

I see, a callsite can lead to different leaf frames, if it is an indirect callsite. That's a good find, thanks! Yeah, using unordered_set sounds good to me. Does it close the gap you saw with the loop scope change?

wlei added inline comments.Aug 5 2021, 1:48 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525

Yes, Now the output of this change is identical to the one without this change while building the clang script

hoy added inline comments.Aug 5 2021, 1:51 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525

That's nice. Wondering what could cause same inline context to point to different frames. For the clang pass1 build, ICP shouldn't be triggered?

wlei added inline comments.Aug 5 2021, 2:13 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525

My understanding is the same frame can point to different inline contexts, same inline context should point to same frame. e.g. probe1(GUID:foo) and probe2(GUID:bar) can point to the same frame(inlinetree) and when the inlinetree's context is extracted like main @ goo and there full context will be
probe1 : main @ goo @ foo
probe2: main @ goo @ bar.

hoy added inline comments.Aug 5 2021, 3:07 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525

In your example, does foo and bar share the same callsite in goo?

wlei added inline comments.Aug 5 2021, 3:20 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525

Yes, that's why here changed to unordered_set, and as you mentioned, foo and bar is indirect call

hoy added inline comments.Aug 5 2021, 3:28 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525

I was wondering for clang pass1 build, how was the indirect call promoted. We don't use profile for pass1 thus ICP should not be triggered?

hoy added inline comments.Aug 5 2021, 4:20 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525

I guess two probes with inline contexts can be merged, and the merged inline context is trimmed to empty of a common part of the inputs. Anyway, that's unrelated to this patch. Using an unordered_set sounds good to me.

wlei added inline comments.Aug 5 2021, 4:27 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
524–525

Thanks for the discussion! That's interesting question, I'm trying to find something in the clang asm, but it's too large to debug. I will land this first and later try to run on SPEC to see if something easy to catch.

wlei edited the summary of this revision. (Show Details)Aug 5 2021, 4:31 PM
This revision was automatically updated to reflect the committed changes.