Prefer emplace_back() over push_back() where appropriate to avoid extra calls to copy and/or move constructors.
Details
Diff Detail
Event Timeline
Using emplacement instead of construction is a good idea, where it makes sense. However, I am not convinced that adding constructors to otherwise-trivial structures is worth it. These structures are usually trivially copyable and/or movable and so an emplacement does not really save much.
And I am definitely sure that the constructors we do add should be taking (lvalue) references -- that precludes construction with anything other than an lvalue, which is just weird. They also force at least one copy, whereas a equivalent (properly written) push_back call could use no copies.
So, I'd just limit this patch to the cases where one does not need to add additional constructors to make emplace_back work.
Use std::pair rather than an ad-hoc two-member trivial structures. Since std::pair provides the default constructor to initialize both members, no extra C++ glue is required for emplace_back()'ing the pairs into containers.
I am sorry, but I don't consider that an improvement. The reason these were not pairs in the first place is because the first/second accessors are very nondescript. If you can demonstrate that these actually improve performance, then we can talk about the best way to do this. However, I'd be very surprised if any of these show up in a profile.
OK let's reduce it to minimum minimorum.
As for the profiling, l can see possible improvement around DWARFAttributes::Append() - inlining even at -O2 and eliminating extra constructor call makes
10.61% intern-state liblldb.so.12.0.0git [.] DWARFDebugInfoEntry::GetAttributes 7.28% intern-state liblldb.so.12.0.0git [.] DWARFDebugInfoEntry::Extract 6.88% intern-state liblldb.so.12.0.0git [.] lldb_private::ManualDWARFIndex::IndexUnitImpl 5.76% intern-state libpthread-2.31.so [.] __pthread_rwlock_rdlock 5.43% intern-state liblldb.so.12.0.0git [.] lldb_private::DataExtractor::GetU32 4.70% intern-state liblldb.so.12.0.0git [.] DWARFUnit::GetDIE 4.49% intern-state liblldb.so.12.0.0git [.] lldb_private::DWARFContext::LoadOrGetSection 4.37% intern-state libLLVM-12git.so [.] llvm::StringMapImpl::FindKey 3.50% intern-state liblldb.so.12.0.0git [.] DWARFFormValue::GetFixedSize 2.73% intern-state liblldb.so.12.0.0git [.] DWARFUnit::ExtractDIEsRWLocked 2.61% intern-state liblldb.so.12.0.0git [.] Pool::GetConstCStringWithStringRef 2.08% intern-state liblldb.so.12.0.0git [.] lldb_private::DataExtractor::GetULEB128 1.82% intern-state ld-2.31.so [.] __tls_get_addr 1.77% intern-state liblldb.so.12.0.0git [.] std::__introsort_loop<__gnu_cxx::__normal_iterator<lldb_private::UniqueCStringMap<DIERef>::Entry*, std::vector<lldb_private::UniqueCStringMap<DIERef>::Entry, std::allocator<lldb_private::UniqueCStringMap<DIERef>::Entry> > >, long, __gnu_cxx::__ops::_Iter_comp_iter<lldb_private::Uniqu eCStringMap<DIERef>::Compare> > 1.60% intern-state liblldb.so.12.0.0git [.] DWARFFormValue::ExtractValue 1.55% intern-state libpthread-2.31.so [.] __pthread_rwlock_unlock 1.23% intern-state libc-2.31.so [.] __strlen_avx2 1.15% intern-state libc-2.31.so [.] _int_free 1.15% intern-state libc-2.31.so [.] malloc 1.14% intern-state liblldb.so.12.0.0git [.] DWARFAttributes::ExtractFormValueAtIndex
from:
7.68% intern-state liblldb.so.12.0.0git [.] DWARFAttributes::Append 7.25% intern-state liblldb.so.12.0.0git [.] DWARFDebugInfoEntry::Extract 6.34% intern-state liblldb.so.12.0.0git [.] lldb_private::ManualDWARFIndex::IndexUnitImpl 6.30% intern-state liblldb.so.12.0.0git [.] DWARFDebugInfoEntry::GetAttributes 5.55% intern-state libpthread-2.31.so [.] __pthread_rwlock_rdlock 5.51% intern-state liblldb.so.12.0.0git [.] lldb_private::DataExtractor::GetU32 4.50% intern-state liblldb.so.12.0.0git [.] DWARFUnit::GetDIE 4.34% intern-state liblldb.so.12.0.0git [.] lldb_private::DWARFContext::LoadOrGetSection 4.27% intern-state libLLVM-12git.so [.] llvm::StringMapImpl::FindKey 2.92% intern-state liblldb.so.12.0.0git [.] DWARFFormValue::GetFixedSize 2.65% intern-state liblldb.so.12.0.0git [.] DWARFUnit::ExtractDIEsRWLocked 2.51% intern-state liblldb.so.12.0.0git [.] Pool::GetConstCStringWithStringRef 1.93% intern-state liblldb.so.12.0.0git [.] lldb_private::DataExtractor::GetULEB128 1.79% intern-state liblldb.so.12.0.0git [.] std::__introsort_loop<__gnu_cxx::__normal_iterator<lldb_private::UniqueCStringMap<DIERef>::Entry*, std::vector<lldb_private::UniqueCStringMap<DIERef>::Entry, std::allocator<lldb_private::UniqueCStringMap<DIERef>::Entry> > >, long, __gnu_cxx::__ops::_Iter_comp_iter<lldb_private::UniqueCStringMap<DIERef>::Compare> > 1.78% intern-state ld-2.31.so [.] __tls_get_addr 1.46% intern-state libpthread-2.31.so [.] __pthread_rwlock_unlock 1.45% intern-state liblldb.so.12.0.0git [.] DWARFFormValue::ExtractValue 1.22% intern-state libc-2.31.so [.] __strlen_avx2 1.16% intern-state libc-2.31.so [.] malloc 1.11% intern-state libc-2.31.so [.] _int_free 1.08% intern-state liblldb.so.12.0.0git [.] DWARFAttributes::ExtractFormValueAtIndex
when loading symbols from clang + libLLVM.so (release build with '-O2 -g').
Ok, this is something we can talk about. dwarf parsing is a big bottle neck, so spending some time to optimize that (and even sacrifice some readability/mantainability/etc. if needed) is worthwhile.
That said, I am pretty sure that any performance gains you're seeing (you didn't explain what/how you're measuring so it's hard to say for sure), are due to the additional inlining opportunities afforded by moving this function to the header, and _not_ because of the use of emplace_back. In my comparison, the two implementations ended up looking fairly different, but I don't think the differences cannot be explained by the use of emplace_back. In both cases, the compiler was able to fully optimize away all the push_backs, constructor calls, etc., and the fast path for both functions consists of a bunch of move instructions. If one of the two versions ends up being slower, then this indicates a missed optimization opportunity on the side of the compiler (which could be removed or even reversed by using a different compiler), and not a fundamental problem with using push_back with trivially copyable structs.
_That_ said, I don't think adding the constructor is necessarily bad in this case, so I think it's fine to stick with the current implementation.