While investigating build-time regressions from Clang 9 to Clang 10, I discovered this:
Topmost functions in Clang 10
Line # | Process Name | Module | Function | Count | Weight (in view) (ms) | % Weight |
1 | clang-cl.exe | 9813998 | 9 816 190.696800 | 74.94 | ||
2 | clang-cl.exe | 8054456 | 8 054 249.395800 | 61.49 | ||
3 | llvm::MCObjectStreamer::addPendingLabel | 134506 | 134 298.342000 | 1.03 | ||
4 | memcmp | 73094 | 73 097.908200 | 0.56 | ||
5 | llvm::StringMapImpl::LookupBucketFor | 72037 | 72 094.650600 | 0.55 | ||
6 | llvm::FoldingSetBase::FindNodeOrInsertPos | 71757 | 71 766.338400 | 0.55 | ||
7 | clang::DiagnosticsEngine::DiagStateMap::getFile | 68787 | 68 821.673400 | 0.53 | ||
8 | llvm::SmallPtrSetImpl<const llvm::BasicBlock *>::insert | 61700 | 61 694.511000 | 0.47 | ||
9 | llvm::SimpleBitstreamCursor::Read | 60287 | 60 286.848300 | 0.46 | ||
10 | llvm::ValueHandleBase::AddToUseList | 54104 | 54 100.684600 | 0.41 | ||
11 | llvm::Use::getUser | 51381 | 51 376.805000 | 0.39 | ||
12 | llvm::removeUnreachableBlocks | 48469 | 48 466.197100 | 0.37 | ||
13 | llvm::PMTopLevelManager::findAnalysisPass | 46510 | 46 503.354600 | 0.36 | ||
14 | llvm::PMDataManager::removeNotPreservedAnalysis | 44020 | 44 014.352500 | 0.34 | ||
15 | llvm::SimpleBitstreamCursor::ReadVBR64 | 42468 | 42 471.558800 | 0.32 | ||
16 | computeKnownBitsFromOperator | 39548 | 39 548.097900 | 0.30 | ||
17 | llvm::FoldingSetBase::GrowBucketCount | 38972 | 38 967.456800 | 0.30 | ||
18 | llvm::PMDataManager::initializeAnalysisImpl | 38459 | 38 455.356500 | 0.29 | ||
19 | llvm::AttributeList::hasAttribute | 38442 | 38 439.664900 | 0.29 | ||
20 | clang::Decl::castFromDeclContext | 35533 | 35 540.141800 | 0.27 | ||
Then, digging a bit:
Most samples are in just two instructions. Taking a look at the generated assembly:
00000001412571F0: 4C 39 2A cmp qword ptr [rdx],r13 00000001412571F3: 74 39 je 000000014125722E 00000001412571F5: 48 83 C2 08 add rdx,8 00000001412571F9: 48 83 C3 F8 add rbx,0FFFFFFFFFFFFFFF8h 00000001412571FD: 75 F1 jne 00000001412571F0
That is the inner loop for the std::find in MCObjectStream.cpp, L62.
I've added a counter at beginning of the function, and one to count the number of iterations in this loop:
g_Calls1 - 576,427 g_Walked - 219,731,421
Which solves to N ^ 1.45.
This issue was introduced in D71368.
The proposed fix simply replaces the SmallVector by a SmallSetVector.
Differential before/after:
Add space, i.e. MCSection *
Have you tried SmallPtrSet<MCSection *>? Does it perform better?