This is an archive of the discontinued LLVM Phabricator instance.

BumpPtrAllocator: Use extern templates and out of line methods
Changes PlannedPublic

Authored by rnk on Feb 27 2020, 11:34 AM.

Details

Reviewers
hans
dblaikie
Summary

Prior to this change, BumpPtrAllocatorImpl<>::Allocate was instantiated
everywhere that included this file, and it was expensive, according to
ClangBuildAnalyzer:

  • Templates that took longest to instantiate: 44057 ms: llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096, 4096, 128>::Allocate (3230 times, avg 13 ms) 34729 ms: std::tie<const unsigned long long, const unsigned long long> (1903 times, avg 18 ms) 33466 ms: std::tie<const unsigned int, const unsigned int, const unsigned int, const unsigned int> (899 times, avg 37 ms) 28112 ms: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (11220 times, avg 2 ms) 23251 ms: std::tie<llvm::StringRef, llvm::StringRef> (1280 times, avg 18 ms) 21820 ms: llvm::SmallDenseMap<void *, std::pair<llvm::PointerUnion<llvm::MetadataAsValue *, llvm::Metadata *>, unsigned long lon... (1298 times, avg 16 ms) 21434 ms: std::tuple<const unsigned int &, const unsigned int &, const unsigned int &, const unsigned int &> (953 times, avg 22 ms) 20595 ms: llvm::SmallVector<std::pair<void *, unsigned long long>, 0> (1972 times, avg 10 ms)

Initially I thought the extern template decl would help, but it did not.
Then I realized that Allocate will be instantiated for optimization
purposes. Moving it out of line prevents that, and realizes the gain.

Diff Detail

Event Timeline

rnk created this revision.Feb 27 2020, 11:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 27 2020, 11:35 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
hans added a comment.Feb 27 2020, 11:59 AM

I thought we wanted BumpPtrAllocatorImpl<>::Allocate to be inline though, so that Size and Alignment could often be constant folded, and so that the fast path (when the allocation fits in the current slab) becomes small and fast.

Also, how does the extern template work here? It looks like you're declaring/defining it with the default template parameters. If someone instantiates it with other parameters, where will they get the Allocate() definition from? If it's never instantiated with the non-default params, maybe it shouldn't be a template in the first place -- this would be my preferred solution because it's remarkably complicated for a quick bump allocator.

rnk planned changes to this revision.Feb 27 2020, 1:58 PM

I thought we wanted BumpPtrAllocatorImpl<>::Allocate to be inline though, so that Size and Alignment could often be constant folded, and so that the fast path (when the allocation fits in the current slab) becomes small and fast.

Also, how does the extern template work here? It looks like you're declaring/defining it with the default template parameters. If someone instantiates it with other parameters, where will they get the Allocate() definition from? If it's never instantiated with the non-default params, maybe it shouldn't be a template in the first place -- this would be my preferred solution because it's remarkably complicated for a quick bump allocator.

Looking at it again, I think most of the expense comes from the "custom sized slab" handling. It calls SmallVector<std::pair<void*, size_t>>::push_back, which is probably where most of the instantiation time goes. If we could move that out of line and make it non-dependent, we'd be good to go.

Regarding the explicit instantiation, all of the methods are still defined in the header, so any users with non-default template args will instantiate their own definitions of Allocate & co. And I agree, this should be less templated, but last I checked, there are enough non-default users that it's not trivial. And 4096 is probably too small of a slab size, but that's a project for another day.

I think I'll play around with this some more to see if we can get the wins without blocking Allocate inlining. I'm not sure how much it matters, though.

hans added a comment.Feb 28 2020, 12:45 AM

Regarding the explicit instantiation, all of the methods are still defined in the header, so any users with non-default template args will instantiate their own definitions of Allocate & co.

Ah, makes sense. For some reason I imagined some code moving into the .cpp file, but I see that's not actually what you were doing.