This is an archive of the discontinued LLVM Phabricator instance.

[MC] Fix quadratic behavior in addPendingLabel
ClosedPublic

Authored by aganea on Apr 23 2020, 5:20 PM.

Details

Summary

While investigating build-time regressions from Clang 9 to Clang 10, I discovered this:

Topmost functions in Clang 10

Line #Process NameModuleFunctionCountWeight (in view) (ms)% Weight
1clang-cl.exe98139989 816 190.69680074.94
2clang-cl.exe80544568 054 249.39580061.49
3llvm::MCObjectStreamer::addPendingLabel134506134 298.3420001.03
4memcmp7309473 097.9082000.56
5llvm::StringMapImpl::LookupBucketFor7203772 094.6506000.55
6llvm::FoldingSetBase::FindNodeOrInsertPos7175771 766.3384000.55
7clang::DiagnosticsEngine::DiagStateMap::getFile6878768 821.6734000.53
8llvm::SmallPtrSetImpl<const llvm::BasicBlock *>::insert6170061 694.5110000.47
9llvm::SimpleBitstreamCursor::Read6028760 286.8483000.46
10llvm::ValueHandleBase::AddToUseList5410454 100.6846000.41
11llvm::Use::getUser5138151 376.8050000.39
12llvm::removeUnreachableBlocks4846948 466.1971000.37
13llvm::PMTopLevelManager::findAnalysisPass4651046 503.3546000.36
14llvm::PMDataManager::removeNotPreservedAnalysis4402044 014.3525000.34
15llvm::SimpleBitstreamCursor::ReadVBR644246842 471.5588000.32
16computeKnownBitsFromOperator3954839 548.0979000.30
17llvm::FoldingSetBase::GrowBucketCount3897238 967.4568000.30
18llvm::PMDataManager::initializeAnalysisImpl3845938 455.3565000.29
19llvm::AttributeList::hasAttribute3844238 439.6649000.29
20clang::Decl::castFromDeclContext3553335 540.1418000.27

Then, digging a bit:

Most samples are in just two instructions. Taking a look at the generated assembly:

00000001412571F0: 4C 39 2A           cmp         qword ptr [rdx],r13
00000001412571F3: 74 39              je          000000014125722E
00000001412571F5: 48 83 C2 08        add         rdx,8
00000001412571F9: 48 83 C3 F8        add         rbx,0FFFFFFFFFFFFFFF8h
00000001412571FD: 75 F1              jne         00000001412571F0

That is the inner loop for the std::find in MCObjectStream.cpp, L62.

I've added a counter at beginning of the function, and one to count the number of iterations in this loop:

g_Calls1 -      576,427
g_Walked -	219,731,421

Which solves to N ^ 1.45.

This issue was introduced in D71368.

The proposed fix simply replaces the SmallVector by a SmallSetVector.

Differential before/after:

Diff Detail

Event Timeline

aganea created this revision.Apr 23 2020, 5:20 PM
MaskRay added a comment.EditedApr 23 2020, 9:26 PM

Thanks for noticing this and fixing it! I am wondering why PendingLabels can be large. Do you happen to note which pattern causes a large number of PendingLabels elements and causes the slowdown?

llvm/include/llvm/MC/MCObjectStreamer.h
42

Add space, i.e. MCSection *

Have you tried SmallPtrSet<MCSection *>? Does it perform better?

aganea added a comment.EditedApr 24 2020, 5:31 AM

Thanks for taking a look @MaskRay !

This seems related to inlining. We are compiling using unity/blobs/jumbo files, can this explain the behavior seen?

Line #, Process, Function, Stack - (Callers of clang-cl.exe!llvm::MCObjectStreamer::addPendingLabel), Count, Weight (in view) (ms), % Weight
 4, , llvm::MCObjectStreamer::addPendingLabel, , 3132, 3,132.066900, 0.25
 5, , , clang-cl.exe!llvm::MCObjectStreamer::addPendingLabel, 3129, 3,129.079500, 0.25
 6, , ,   |- clang-cl.exe!llvm::CodeViewDebug::emitLocalVariable, 1686, 1,685.979300, 0.13
 7, , ,   |    clang-cl.exe!llvm::CodeViewDebug::emitLocalVariableList, 1686, 1,685.979300, 0.13
 8, , ,   |    |- clang-cl.exe!llvm::CodeViewDebug::emitInlinedCallSite, 1494, 1,494.012700, 0.12
 9, , ,   |    |    |- clang-cl.exe!llvm::CodeViewDebug::emitInlinedCallSite, 1166, 1,166.030500, 0.09
10, , ,   |    |    |    |- clang-cl.exe!llvm::CodeViewDebug::emitInlinedCallSite, 833, 833.003600, 0.07
11, , ,   |    |    |    |    |- clang-cl.exe!llvm::CodeViewDebug::emitInlinedCallSite, 567, 567.000200, 0.05
12, , ,   |    |    |    |    |    |- clang-cl.exe!llvm::CodeViewDebug::emitInlinedCallSite, 345, 345.022900, 0.03
13, , ,   |    |    |    |    |    |    |- clang-cl.exe!llvm::CodeViewDebug::emitInlinedCallSite, 214, 214.008400, 0.02
14, , ,   |    |    |    |    |    |    |    |- clang-cl.exe!llvm::CodeViewDebug::emitDebugInfoForFunction, 143, 143.023800, 0.01
15, , ,   |    |    |    |    |    |    |    |- clang-cl.exe!llvm::CodeViewDebug::emitInlinedCallSite, 71, 70.984600, 0.01
16, , ,   |    |    |    |    |    |    |- clang-cl.exe!llvm::CodeViewDebug::emitDebugInfoForFunction, 131, 131.014500, 0.01
17, , ,   |    |    |    |    |    |- clang-cl.exe!llvm::CodeViewDebug::emitDebugInfoForFunction, 222, 221.977300, 0.02
18, , ,   |    |    |    |    |- clang-cl.exe!llvm::CodeViewDebug::emitDebugInfoForFunction, 266, 266.003400, 0.02
19, , ,   |    |    |    |- clang-cl.exe!llvm::CodeViewDebug::emitDebugInfoForFunction, 333, 333.026900, 0.03
20, , ,   |    |    |- clang-cl.exe!llvm::CodeViewDebug::emitDebugInfoForFunction, 328, 327.982200, 0.03
21, , ,   |    |- clang-cl.exe!llvm::CodeViewDebug::emitDebugInfoForFunction, 173, 172.972900, 0.01
22, , ,   |    |- clang-cl.exe!llvm::CodeViewDebug::emitLexicalBlock, 19, 18.993700, 0.00
23, , ,   |- clang-cl.exe!llvm::CodeViewDebug::emitInlinedCallSite, 811, 811.067000, 0.06
24, , ,   |- clang-cl.exe!llvm::CodeViewDebug::emitDebugInfoForFunction, 150, 149.963500, 0.01
25, , ,   |- clang-cl.exe!llvm::DebugHandlerBase::endInstruction, 98, 97.998700, 0.01
26, , ,   |- clang-cl.exe!llvm::MCObjectStreamer::emitCVLocDirective, 84, 83.992700, 0.01
27, , ,   |- clang-cl.exe!llvm::AsmPrinter::emitFunctionEntryLabel, 54, 54.009600, 0.00
28, , ,   |- clang-cl.exe!llvm::MCObjectStreamer::emitCFILabel, 39, 39.018600, 0.00
29, , ,   |- clang-cl.exe!llvm::AsmPrinter::emitBasicBlockStart, 37, 37.036400, 0.00
30, , ,   |- clang-cl.exe!llvm::AsmPrinter::emitFunctionHeader, 31, 31.037600, 0.00
31, , ,   |- clang-cl.exe!EmitUnwindInfo, 28, 28.029200, 0.00
32, , ,   |- clang-cl.exe!llvm::CodeViewDebug::emitDebugInfoForUDTs, 24, 23.999000, 0.00
33, , ,   |- clang-cl.exe!llvm::AsmPrinter::emitGlobalVariable, 24, 23.991300, 0.00
34, , ,   |- clang-cl.exe!llvm::CodeViewDebug::emitLexicalBlock, 17, 16.988900, 0.00
35, , ,   |- clang-cl.exe!llvm::CodeViewDebug::emitDebugInfoForGlobal, 14, 13.999800, 0.00
36, , ,   |- clang-cl.exe!llvm::WinException::beginFunclet, 14, 13.969700, 0.00
37, , ,   |- clang-cl.exe!llvm::AsmPrinter::emitFunctionBody, 6, 6.000600, 0.00
38, , ,   |- clang-cl.exe!llvm::WinException::emitCXXFrameHandler3Table, 6, 5.998700, 0.00
39, , ,   |- clang-cl.exe!llvm::CodeViewDebug::emitDebugInfoForThunk, 5, 4.999600, 0.00
40, , ,   |- clang-cl.exe!llvm::DebugHandlerBase::beginInstruction, 1, 0.999300, 0.00
aganea updated this revision to Diff 259872.Apr 24 2020, 5:36 AM

Using a SmallPtrSet. The performance is the same as SmallDenseSet.

Shouldn't they be functionally equivalent? I know SmallPtrSet does a linear search when the set is 'small' but that shouldn't matter, the cache effects dominate.

MaskRay accepted this revision.Apr 24 2020, 8:29 AM

LGTM.

This revision is now accepted and ready to land.Apr 24 2020, 8:29 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 24 2020, 10:16 AM

Thanks for finding this!

aganea edited the summary of this revision. (Show Details)Apr 24 2020, 11:48 AM
MaskRay added a comment.EditedApr 24 2020, 12:34 PM

I always use arcanist to upload a differential and that sets up everything correctly https://llvm.org/docs/GettingStarted.html#sending-patches

I use a shell function to drop unneeded Phabricator tags (Reviewers: Subscribers: Tags: and the substring Summary :).

arcfilter () {
        arc amend
        git log -1 --pretty=%B | awk '/Reviewers:|Subscribers:/{p=1} /Reviewed By:|Differential Revision:/{p=0} !p && !/^Summary:$/ {sub(/^Summary: /,
"");print}' | git commit --amend -F -
}

It was suggested that these tags can be removed on the Phabricator server side https://lists.llvm.org/pipermail/llvm-dev/2020-April/140808.html

Iterating over a SmallPrtSet is non-deterministic; please use SmallSetVector.

Iterating over a SmallPrtSet is non-deterministic; please use SmallSetVector.

Thanks for noting this. Forgot it.

aganea updated this revision to Diff 260100.Apr 25 2020, 8:06 AM
aganea marked an inline comment as done.

As suggested by @efriedma, PTAL.

aganea reopened this revision.Apr 25 2020, 8:07 AM
This revision is now accepted and ready to land.Apr 25 2020, 8:07 AM
aganea requested review of this revision.Apr 25 2020, 8:07 AM
MaskRay accepted this revision.Apr 25 2020, 8:55 AM

Looks great!

This revision is now accepted and ready to land.Apr 25 2020, 8:55 AM
skan accepted this revision.Apr 25 2020, 7:46 PM

LGTM (with one comment)

llvm/include/llvm/MC/MCObjectStreamer.h
42

Update this in description.

The proposed fix simply replaces the SmallVector by a (Small)DenseSet SmallPtrSet.

aganea edited the summary of this revision. (Show Details)Apr 26 2020, 7:36 AM
This revision was automatically updated to reflect the committed changes.