This is an archive of the discontinued LLVM Phabricator instance.

Added 'inline' attribute to basic_string's destructor
AbandonedPublic

Authored by hiraditya on Oct 14 2016, 9:11 AM.

Details

Summary

Author: laxmansole

Original Differential Revision: https://reviews.llvm.org/D22834

Posting the patch again as the bug https://llvm.org/bugs/show_bug.cgi?id=30341 is fixed.

Currently basic_string's destructor is not getting inlined. So adding 'inline' attribute to ~basic_string().
Worked in collaboration with Aditya Kumar.

git-svn-id: https://llvm.org/svn/llvm-project/libcxx/trunk@280944 91177308-0d34-0410-b5e6-96231b3b80d8

Diff Detail

Event Timeline

hiraditya updated this revision to Diff 74699.Oct 14 2016, 9:11 AM
hiraditya retitled this revision from to Added 'inline' attribute to basic_string's destructor.
hiraditya updated this object.
hiraditya added reviewers: mclow.lists, EricWF.
hiraditya added subscribers: laxmansole, sebpop, cfe-commits.
sebpop accepted this revision.Oct 14 2016, 4:59 PM
sebpop added a reviewer: sebpop.

This got approved in the past review.
Let's commit it now that the clang bug was fixed.
Thanks Aditya for keeping track of this.

This revision is now accepted and ready to land.Oct 14 2016, 4:59 PM
EricWF edited edge metadata.Oct 14 2016, 5:02 PM

Has the fix been merged into the 3.9 branch? Does re-adding this attribute mean that Clang 4.0 is required to build LLVM w/ libc++?

If I remember correctly, we pushed the fix after 3.9 was released.
Could you please explain the problem of requiring trunk LLVM to build trunk libcxx?

mclow.lists edited edge metadata.Oct 19 2016, 11:12 AM

I don't have a problem with this being marked as inline, as long as it doesn't disappear out of the dylib.

There *has* to be a version of basic_string<char, char_traits<char>, Allocator<char>>::~basic_string in the dylib - existing applications depend upon it. (same for wchar_t).

mclow.lists added a comment.EditedOct 19 2016, 11:13 AM

This is the same issue as in D24991

hiraditya updated this revision to Diff 75915.Oct 26 2016, 10:11 AM
hiraditya edited edge metadata.

Added macro to keep the definition of destructor of basic_string in string.cpp

sebpop added inline comments.Oct 27 2016, 7:44 AM
libcxx/include/string
1837

and let's also use the define just here:

#ifndef _LIBCPP_BUILDING_STRING
inline _LIBCPP_INLINE_VISIBILITY
#endif
libcxx/src/string.cpp
10 ↗(On Diff #75915)

To be consistent with memory.cpp:

#define _LIBCPP_BUILDING_MEMORY

let's rename the define as:

#define _LIBCPP_BUILDING_STRING
hiraditya updated this revision to Diff 76192.Oct 28 2016, 8:17 AM

Addressed Sebastian's comments.

Other than removing that comment the patch looks good.
Thanks!

libcxx/src/string.cpp
10 ↗(On Diff #76192)

Let's not add this comment here, as this define may be used in the future for other purposes. Also a simple grep would give you the places where this is used.

Actually we don't want to copy memory.cpp in this case. Please use _LIBCPP_EXTERN_TEMPLATE_INLINE_VISIBILITY. There are only examples in <string> you can follow. Make sure to put it on the in-class declaration.

Additionally _LIBCPP_EXTERN_TEMPLATE_INLINE_VISIBILITY is documented here http://libcxx.llvm.org/docs/DesignDocs/VisibilityMacros.html.

hiraditya updated this revision to Diff 76240.Oct 28 2016, 1:09 PM
EricWF accepted this revision.Oct 28 2016, 1:11 PM
EricWF edited edge metadata.
EricWF added inline comments.
libcxx/include/string
1837

The attribute should appear on the first declaration not the out-of-line definition. Feel free to commit after making that change.

hiraditya added inline comments.Oct 28 2016, 1:43 PM
libcxx/include/string
1837

Okay, thanks for the correction.

I don't understand:

  1. The motivation for this change
  2. Why is this correct?

Can we restart the discussion here (I'm reverting to fix the LTO build failure here: http://lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto_build/10737/console )

I don't understand:

  1. The motivation for this change

This is a change for performance: we have seen some benchmarks where inlining the string dtor brings performance up by 5%: from what I remember, there is a string ctor shortly followed by a dtor that could be optimized away if both the ctor and dtor are inlined. We already committed the patch to do the ctor inlining, and that gave us some performance. This patch is what remains to get the rest of the performance by inlining the string dtor.

  1. Why is this correct?

There seems to be a problem the patch uncovers, and we will investigate.
Thanks for pointing out this issue.

Trying to figure out: it seems you are trying to address what I reported here: https://llvm.org/bugs/show_bug.cgi?id=26498 ; right?

Except for inline functions, declarations with types deduced
from their initializer or return value (7.1.6.4), const
variables of literal types, variables of reference types,
and class template specializations, explicit instantiation
declarations have the effect of suppressing the implicit
instantiation of the entity to which they refer. [ Note:
The intent is that an inline function that is the subject
of an explicit instantiation declaration will still be
implicitly instantiated when odr-used (3.2) so that the
body can be considered for inlining, but that no
out-of-line copy of the inline function would be generated
in the translation unit. — end note ]

So the _LIBCPP_EXTERN_TEMPLATE(class _LIBCPP_TYPE_VIS basic_string<char>) block the instantiation of any non-inline template function.
The effect of the "inline" keyword is actually allowing clang to actually instantiate and IRGen the function, so that it is available for inlining.

This seems like a generally good problem to solve for every such function and not only the destructor (I mentioned __init() as well in PR26498).

(The commit message is confusing, it mentions "Currently basic_string's destructor is not getting inlined. So adding 'inline' attribute to ~basic_string()", please add the quote of the standard and mention that it enables instantiation)

(The commit message is confusing, it mentions "Currently basic_string's destructor is not getting inlined. So adding 'inline' attribute to ~basic_string()", please add the quote of the standard and mention that it enables instantiation)

(i.e. "inline" does not cause directly "inlining" here, this comes as a side effect of just starting to emit it)

FYI reproduced locally, and investigating now.

I see a decl:

declare hidden void @_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEED1Ev(%"class.std::__1::basic_string"*) unnamed_addr #9 align 2

(Note the hidden which prevent from finding it in the dyib)

And a use (after inlining):

%1024 = call i32 @__cxa_atexit(void (i8*)* bitcast (void (%"class.std::__1::basic_string"*)* @_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEED1Ev to void (i8*)*), i8* bitcast (%"class.std::__1::basic_string"* @_ZZN12_GLOBAL__N_13DFA16writeTableAndAPIERN4llvm11raw_ostreamERKNSt3__112basic_stringIcNS4_11char_traitsIcEENS4_9allocatorIcEEEEiiiiE13SentinelEntry to i8*), i8* nonnull @__dso_handle) #1

I believe the issue is _LIBCPP_EXTERN_TEMPLATE_INLINE_VISIBILITY itself. I asked here: http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20161024/175454.html

Before this gets re-committed, I'd like to understand why only the destructor?

I talked with Eric on IRC, he mentioned some benchmarks were ran. I'd like to understand what was the baseline?
Here we add *both* the inline keyword and the always_inline attribute. I'd like to know if there is a benchmarks that shows that always_inline is beneficial on top of the inline keyword.
If we need to add always_inline anywhere: this is likely an inliner heuristic failure and we should at minima track it as an example to improve it.

I'd like to understand why only the destructor?

We have committed a patch to inline the constructor of std::string.
Do you think we should inline some other functions?

I talked with Eric on IRC, he mentioned some benchmarks were ran. I'd like to understand what was the baseline?

We ran a proprietary benchmark compiled with clang + libc++ and compared against clang + libstdc++.
With this patch, libc++ is on par with libstdc++.

Here we add *both* the inline keyword and the always_inline attribute. I'd like to know if there is a benchmarks that shows that always_inline is beneficial on top of the inline keyword.
If we need to add always_inline anywhere: this is likely an inliner heuristic failure and we should at minima track it as an example to improve it.

This is a good observation: I am not sure we ran the benchmark without the always_inline attribute.
We will try again wihout that attribute and report whether it matters performance wise.
Thanks Mehdi for catching this!

I talked with Eric on IRC, he mentioned some benchmarks were ran. I'd like to understand what was the baseline?
Here we add *both* the inline keyword and the always_inline attribute. I'd like to know if there is a benchmarks that shows that always_inline is beneficial on top of the inline keyword.
If we need to add always_inline anywhere: this is likely an inliner heuristic failure and we should at minima track it as an example to improve it.

I remmber that the constructor of std::string wouldn't inline without always_inline attribute D22782

I'd like to understand why only the destructor?

We have committed a patch to inline the constructor of std::string.
Do you think we should inline some other functions?

I'd say all of them (unless there's a reason to not do) :)
So: default to the inline keyword, with possibility of opt-out on a case by case.

The only impact of the inline attribute, as I understand, is to make sure some IR is emitted. There won't be any codegen, and the compile-time impact should be reasonnable.

I talked with Eric on IRC, he mentioned some benchmarks were ran. I'd like to understand what was the baseline?
Here we add *both* the inline keyword and the always_inline attribute. I'd like to know if there is a benchmarks that shows that always_inline is beneficial on top of the inline keyword.
If we need to add always_inline anywhere: this is likely an inliner heuristic failure and we should at minima track it as an example to improve it.

I remmber that the constructor of std::string wouldn't inline without always_inline attribute D22782

I don't see where is it mentioned in the revision you are linking, can you elaborate what I should look at?

(The commit message is confusing, it mentions "Currently basic_string's destructor is not getting inlined. So adding 'inline' attribute to ~basic_string()", please add the quote of the standard and mention that it enables instantiation)

(i.e. "inline" does not cause directly "inlining" here, this comes as a side effect of just starting to emit it)

You are right. The problem was that the function definition wasn't getting generated in the translation unit as a result the inliner cannot inline. For some reason clang does not generate the definition of available_externally functions. Sorry for the confusion, this patch is sitting for a long time so I forgot about it. Just adding inline should also have the same effect. I can commit a separate patch with just the inline specifier if there is an agreement.

I think @EricWF already committed the patch with just the inline keyword?

hiraditya abandoned this revision.Nov 7 2016, 7:16 PM

I think @EricWF already committed the patch with just the inline keyword?

Yes he's done already. Closing this.
Thanks for your feedback.