While investigating build-time regressions from Clang 9 to Clang 10, I discovered this:
**Topmost functions in Clang 10**
| Line # | Process Name | Module | Function | Count | Weight (in view) (ms) | % Weight |
| 1 | `clang-cl.exe` | | | 9813998 | 9 816 190.696800 | 74.94 |
| 2 | | `clang-cl.exe` | | 8054456 | 8 054 249.395800 | 61.49 |
| 3 | | | `llvm::MCObjectStreamer::addPendingLabel` | 134506 | 134 298.342000 | 1.03 |
| 4 | | | `memcmp` | 73094 | 73 097.908200 | 0.56 |
| 5 | | | `llvm::StringMapImpl::LookupBucketFor` | 72037 | 72 094.650600 | 0.55 |
| 6 | | | `llvm::FoldingSetBase::FindNodeOrInsertPos` | 71757 | 71 766.338400 | 0.55 |
| 7 | | | `clang::DiagnosticsEngine::DiagStateMap::getFile` | 68787 | 68 821.673400 | 0.53 |
| 8 | | | `llvm::SmallPtrSetImpl<const llvm::BasicBlock *>::insert` | 61700 | 61 694.511000 | 0.47 |
| 9 | | | `llvm::SimpleBitstreamCursor::Read` | 60287 | 60 286.848300 | 0.46 |
| 10 | | | `llvm::ValueHandleBase::AddToUseList` | 54104 | 54 100.684600 | 0.41 |
| 11 | | | `llvm::Use::getUser` | 51381 | 51 376.805000 | 0.39 |
| 12 | | | `llvm::removeUnreachableBlocks` | 48469 | 48 466.197100 | 0.37 |
| 13 | | | `llvm::PMTopLevelManager::findAnalysisPass` | 46510 | 46 503.354600 | 0.36 |
| 14 | | | `llvm::PMDataManager::removeNotPreservedAnalysis` | 44020 | 44 014.352500 | 0.34 |
| 15 | | | `llvm::SimpleBitstreamCursor::ReadVBR64` | 42468 | 42 471.558800 | 0.32 |
| 16 | | | `computeKnownBitsFromOperator` | 39548 | 39 548.097900 | 0.30 |
| 17 | | | `llvm::FoldingSetBase::GrowBucketCount` | 38972 | 38 967.456800 | 0.30 |
| 18 | | | `llvm::PMDataManager::initializeAnalysisImpl` | 38459 | 38 455.356500 | 0.29 |
| 19 | | | `llvm::AttributeList::hasAttribute` | 38442 | 38 439.664900 | 0.29 |
| 20 | | | `clang::Decl::castFromDeclContext` | 35533 | 35 540.141800 | 0.27 |
Then, digging a bit:
{F11786048}
Most samples are in just two instructions. Taking a look at the generated assembly:
```
00000001412571F0: 4C 39 2A cmp qword ptr [rdx],r13
00000001412571F3: 74 39 je 000000014125722E
00000001412571F5: 48 83 C2 08 add rdx,8
00000001412571F9: 48 83 C3 F8 add rbx,0FFFFFFFFFFFFFFF8h
00000001412571FD: 75 F1 jne 00000001412571F0
```
That is the inner loop for the `std::find` in `MCObjectStream.cpp, L62`.
I've added a counter at beginning of the function, and one to count the number of iterations in this loop:
```
g_Calls1 - 576,427
g_Walked - 219,731,421
```
Which solves to `N ^ 1.45`.
This issue was introduced in D71368.
The proposed fix simply replaces the SmallVector by a ~~(Small)DenseSet~~ SmallPtrSet.
Differential before/after:
{F11786474}