Prior to this change, BumpPtrAllocatorImpl<>::Allocate was instantiated
everywhere that included this file, and it was expensive, according to
ClangBuildAnalyzer:
- Templates that took longest to instantiate: 44057 ms: llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096, 4096, 128>::Allocate (3230 times, avg 13 ms) 34729 ms: std::tie<const unsigned long long, const unsigned long long> (1903 times, avg 18 ms) 33466 ms: std::tie<const unsigned int, const unsigned int, const unsigned int, const unsigned int> (899 times, avg 37 ms) 28112 ms: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (11220 times, avg 2 ms) 23251 ms: std::tie<llvm::StringRef, llvm::StringRef> (1280 times, avg 18 ms) 21820 ms: llvm::SmallDenseMap<void *, std::pair<llvm::PointerUnion<llvm::MetadataAsValue *, llvm::Metadata *>, unsigned long lon... (1298 times, avg 16 ms) 21434 ms: std::tuple<const unsigned int &, const unsigned int &, const unsigned int &, const unsigned int &> (953 times, avg 22 ms) 20595 ms: llvm::SmallVector<std::pair<void *, unsigned long long>, 0> (1972 times, avg 10 ms)
Initially I thought the extern template decl would help, but it did not.
Then I realized that Allocate will be instantiated for optimization
purposes. Moving it out of line prevents that, and realizes the gain.
clang-format: please reformat the code