This is an archive of the discontinued LLVM Phabricator instance.

Added 'inline' attribute to __init to inline the basic_string's constructor
ClosedPublic

Authored by laxmansole on Jul 25 2016, 2:41 PM.

Download Raw Diff

Details

Reviewers

mclow.lists
EricWF
howard.hinnant

Commits

rG51b4aee94f71: Add 'inline' attribute to __init to inline the basic_string's constructor
rCXX278356: Add 'inline' attribute to __init to inline the basic_string's constructor
rL278356: Add 'inline' attribute to __init to inline the basic_string's constructor

Summary

basic_string's constructor calls init which was not getting inlined.
This prevented optimization of const string as init would appear as a call in between a string's def and use.

Worked in collaboration with Aditya Kumar.

Diff Detail

Repository: rL LLVM

Event Timeline

laxmansole updated this revision to Diff 65424.Jul 25 2016, 2:41 PM

laxmansole retitled this revision from to Added 'inline' attribute to __init to inline the basic_string's constructor.

laxmansole updated this object.

laxmansole added reviewers: mclow.lists, howard.hinnant.

laxmansole updated this object.

laxmansole added subscribers: hiraditya, sebpop, laxmansole and 3 others.

laxmansole updated this object.Jul 25 2016, 2:44 PM

laxmansole updated this object.

majnemer edited subscribers, added: cfe-commits; removed: llvm-commits.Jul 25 2016, 2:49 PM

Do we have a test for the problem that this is solving?

In D22782#495436, @mclow.lists wrote:

Do we have a test for the problem that this is solving?

I think we can write a testcase that shows that copy constructors are not optimized away unless the string constructor is inlined.

This patch fixes the performance of a proprietary benchmark when compiled with libc++, closing the performance gap with the same benchmark compiled with the gnu libstdc++. Overall with this patch we have half a billion fewer instructions out of about 10 billion.

In D22782#495436, @mclow.lists wrote:

Do we have a test for the problem that this is solving?

We are looking at the libcxx testsuite, it seems there are only correctness/functionality tests. Do you have any pointers on how to add tests to check if functions are inlined. In llvm it is easy to do because we can print the assembly and use FileCheck/grep to CHECK/CHECK-NOT if a string is present.

In D22782#495436, @mclow.lists wrote:

Do we have a test for the problem that this is solving?

$ cat foo.cpp

int foo(const std::string name);
int bar(){
    return foo("bar");
}

$clang++ -S -O3 -fno-exceptions foo.cpp

Assembly output without patch:

_Z3barv:                                // @_Z3barv
// BB#0:                                // %entry
    sub sp, sp, #64             // =64
    adrp    x1, .L.str
    add x1, x1, :lo12:.L.str
    add x0, sp, #8              // =8
    orr w2, wzr, #0x3
    stp xzr, x19, [sp, #24]     // 8-byte Folded Spill
    stp x29, x30, [sp, #48]     // 8-byte Folded Spill
    add x29, sp, #48            // =48
    stp xzr, xzr, [sp, #8]
    bl  _ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE6__initEPKcm
    add x0, sp, #8              // =8
    bl  _Z3fooNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEE
    and w19, w0, #0x1
    add x0, sp, #8              // =8
    bl  _ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEED1Ev
    mov  w0, w19
    ldp x29, x30, [sp, #48]     // 8-byte Folded Reload
    ldr x19, [sp, #32]          // 8-byte Folded Reload
    add sp, sp, #64             // =64
    ret

Assembly output with patch:

_Z3barv:                                // @_Z3barv
// BB#0:                                // %entry
    sub sp, sp, #64             // =64
    orr w8, wzr, #0x6
    mov w9, #114
    mov w10, #24930
    add x0, sp, #8              // =8
    stp xzr, x19, [sp, #24]     // 8-byte Folded Spill
    stp x29, x30, [sp, #48]     // 8-byte Folded Spill
    add x29, sp, #48            // =48
    stp xzr, xzr, [sp, #8]
    strb    w8, [sp, #8]
    strb    w9, [sp, #11]
    sturh   w10, [sp, #9]
    strb    wzr, [sp, #12]
    bl  _Z3fooNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEE
    and w19, w0, #0x1
    add x0, sp, #8              // =8
    bl  _ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEED1Ev
    mov  w0, w19
    ldp x29, x30, [sp, #48]     // 8-byte Folded Reload
    ldr x19, [sp, #32]          // 8-byte Folded Reload
    add sp, sp, #64             // =64
    ret

We can see that call to

_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE6__initEPKcm

get inlined.

Ping

The change itself LGTM, although we probably want to inline the forward/input iterator __init's as well.

However I would like to see a small benchmark that demonstrates the performance change. Please try and write the benchmark using Google Benchmark.
Some helpful links:

In D22782#504416, @EricWF wrote:

The change itself LGTM, although we probably want to inline the forward/input iterator __init's as well.

However I would like to see a small benchmark that demonstrates the performance change. Please try and write the benchmark using Google Benchmark.
Some helpful links:

http://libcxx.llvm.org/docs/TestingLibcxx.html#building-benchmarks

http://github.com/google/benchmark

Sure,
We'll come up with a synthetic benchmark to expose performance improvements.

Thanks,

Added inline attribute to the forward/input iterator __init's.
Thanks @EricWF for suggestion.

I would love to see a benchmark with this, but I've done enough investigating on my own that I *know* this patch is beneficial.

This revision is now accepted and ready to land.Aug 11 2016, 1:19 AM

Closed by commit rL278356: Add 'inline' attribute to __init to inline the basic_string's constructor (authored by spop). · Explain WhyAug 11 2016, 9:59 AM

This revision was automatically updated to reflect the committed changes.

In D22782#512337, @EricWF wrote:

I would love to see a benchmark with this, but I've done enough investigating on my own that I *know* this patch is beneficial.

This patch was motivated by perf analysis we did on a proprietary benchmark in which we have seen a reduction of about 1 billion instructions (out of 10B) on x86_64-linux and aarch64-linux.

hiraditya mentioned this in D25624: Added 'inline' attribute to basic_string's destructor.Oct 31 2016, 3:37 PM

hiraditya mentioned this in D85628: [HotColdSplitting] Add command line options for supplying cold function names via user input..Aug 12 2020, 10:24 PM

Revision Contents

Path

Size

libcxx/

trunk/

include/

string

5 lines

Diff 67699

libcxx/trunk/include/string

Show First 20 Lines • Show All 1,436 Lines • ▼ Show 20 Lines
{		{
#if _LIBCPP_DEBUG_LEVEL >= 2		#if _LIBCPP_DEBUG_LEVEL >= 2
__get_db()->__insert_c(this);		__get_db()->__insert_c(this);
#endif		#endif
__zero();		__zero();
}		}

template <class _CharT, class _Traits, class _Allocator>		template <class _CharT, class _Traits, class _Allocator>
		inline _LIBCPP_INLINE_VISIBILITY
void		void
basic_string<_CharT, _Traits, _Allocator>::__init(const value_type* __s, size_type __sz, size_type __reserve)		basic_string<_CharT, _Traits, _Allocator>::__init(const value_type* __s, size_type __sz, size_type __reserve)
{		{
if (__reserve > max_size())		if (__reserve > max_size())
this->__throw_length_error();		this->__throw_length_error();
pointer __p;		pointer __p;
if (__reserve < __min_cap)		if (__reserve < __min_cap)
{		{
__set_short_size(__sz);		__set_short_size(__sz);
__p = __get_short_pointer();		__p = __get_short_pointer();
}		}
else		else
{		{
size_type __cap = __recommend(__reserve);		size_type __cap = __recommend(__reserve);
__p = __alloc_traits::allocate(__alloc(), __cap+1);		__p = __alloc_traits::allocate(__alloc(), __cap+1);
__set_long_pointer(__p);		__set_long_pointer(__p);
__set_long_cap(__cap+1);		__set_long_cap(__cap+1);
__set_long_size(__sz);		__set_long_size(__sz);
}		}
traits_type::copy(_VSTD::__to_raw_pointer(__p), __s, __sz);		traits_type::copy(_VSTD::__to_raw_pointer(__p), __s, __sz);
traits_type::assign(__p[__sz], value_type());		traits_type::assign(__p[__sz], value_type());
}		}

template <class _CharT, class _Traits, class _Allocator>		template <class _CharT, class _Traits, class _Allocator>
		inline _LIBCPP_INLINE_VISIBILITY
void		void
basic_string<_CharT, _Traits, _Allocator>::__init(const value_type* __s, size_type __sz)		basic_string<_CharT, _Traits, _Allocator>::__init(const value_type* __s, size_type __sz)
{		{
if (__sz > max_size())		if (__sz > max_size())
this->__throw_length_error();		this->__throw_length_error();
pointer __p;		pointer __p;
if (__sz < __min_cap)		if (__sz < __min_cap)
{		{
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	#if _LIBCPP_DEBUG_LEVEL >= 2
if (__is_long())		if (__is_long())
__get_db()->swap(this, &__str);		__get_db()->swap(this, &__str);
#endif		#endif
}		}

#endif // _LIBCPP_HAS_NO_RVALUE_REFERENCES		#endif // _LIBCPP_HAS_NO_RVALUE_REFERENCES

template <class _CharT, class _Traits, class _Allocator>		template <class _CharT, class _Traits, class _Allocator>
		inline _LIBCPP_INLINE_VISIBILITY
void		void
basic_string<_CharT, _Traits, _Allocator>::__init(size_type __n, value_type __c)		basic_string<_CharT, _Traits, _Allocator>::__init(size_type __n, value_type __c)
{		{
if (__n > max_size())		if (__n > max_size())
this->__throw_length_error();		this->__throw_length_error();
pointer __p;		pointer __p;
if (__n < __min_cap)		if (__n < __min_cap)
{		{
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	basic_string<_CharT, _Traits, _Allocator>::basic_string(__self_view __sv, const allocator_type& __a)
__init(__sv.data(), __sv.size());		__init(__sv.data(), __sv.size());
#if _LIBCPP_DEBUG_LEVEL >= 2		#if _LIBCPP_DEBUG_LEVEL >= 2
__get_db()->__insert_c(this);		__get_db()->__insert_c(this);
#endif		#endif
}		}

template <class _CharT, class _Traits, class _Allocator>		template <class _CharT, class _Traits, class _Allocator>
template <class _InputIterator>		template <class _InputIterator>
		inline _LIBCPP_INLINE_VISIBILITY
typename enable_if		typename enable_if
<		<
__is_exactly_input_iterator<_InputIterator>::value,		__is_exactly_input_iterator<_InputIterator>::value,
void		void
>::type		>::type
basic_string<_CharT, _Traits, _Allocator>::__init(_InputIterator __first, _InputIterator __last)		basic_string<_CharT, _Traits, _Allocator>::__init(_InputIterator __first, _InputIterator __last)
{		{
__zero();		__zero();
Show All 11 Lines	catch (...)
__alloc_traits::deallocate(__alloc(), __get_long_pointer(), __get_long_cap());		__alloc_traits::deallocate(__alloc(), __get_long_pointer(), __get_long_cap());
throw;		throw;
}		}
#endif // _LIBCPP_NO_EXCEPTIONS		#endif // _LIBCPP_NO_EXCEPTIONS
}		}

template <class _CharT, class _Traits, class _Allocator>		template <class _CharT, class _Traits, class _Allocator>
template <class _ForwardIterator>		template <class _ForwardIterator>
		inline _LIBCPP_INLINE_VISIBILITY
typename enable_if		typename enable_if
<		<
__is_forward_iterator<_ForwardIterator>::value,		__is_forward_iterator<_ForwardIterator>::value,
void		void
>::type		>::type
basic_string<_CharT, _Traits, _Allocator>::__init(_ForwardIterator __first, _ForwardIterator __last)		basic_string<_CharT, _Traits, _Allocator>::__init(_ForwardIterator __first, _ForwardIterator __last)
{		{
size_type __sz = static_cast<size_type>(_VSTD::distance(__first, __last));		size_type __sz = static_cast<size_type>(_VSTD::distance(__first, __last));
▲ Show 20 Lines • Show All 2,164 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Added 'inline' attribute to __init to inline the basic_string's constructorClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 67699

libcxx/trunk/include/string

Added 'inline' attribute to __init to inline the basic_string's constructor
ClosedPublic