This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Optimize `exception_ptr`, especially for the empty case
Needs ReviewPublic

Authored by fwolff on Mar 26 2022, 3:05 PM.

Details

Reviewers
ldionne
Mordante
Group Reviewers
Restricted Project
Summary

Fixes #44892. The example given there does indeed optimize away completely with my changes.

Diff Detail

Event Timeline

fwolff created this revision.Mar 26 2022, 3:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 26 2022, 3:05 PM
fwolff requested review of this revision.Mar 26 2022, 3:05 PM
Herald added a reviewer: Restricted Project. · View Herald TranscriptMar 26 2022, 3:05 PM

I think this could be implemented in a non ABI breaking way by using __attribute__((__gnu_inline__)). See https://godbolt.org/z/W83rTb714 for an example. If you want to allow it in ABIv1 this way you should probably just disable the magic in ABIv2.

libcxx/src/support/runtime/exception_pointer_cxxabi.ipp
17

Is there a reason for this additional indirection?

Can you add some benchmarks to show how much improvements these changes bring?

fwolff added a comment.EditedApr 12 2022, 8:12 AM

I think this could be implemented in a non ABI breaking way by using __attribute__((__gnu_inline__)). See https://godbolt.org/z/W83rTb714 for an example. If you want to allow it in ABIv1 this way you should probably just disable the magic in ABIv2.

Sorry, but I haven't been able to get this to work. I don't really understand your example, and the documentation for gnu_inline is pretty sparse.

Yes, indeed, thanks for pointing this out!

Can you add some benchmarks to show how much improvements these changes bring?

I don't know what a sensible benchmark for this would look like, but have a look at the generated assembly:

Example 1 (from #44892): https://godbolt.org/z/YGq86cszM. With my changes, this compiles to just

0000000000000000 <_Z4testv>:
   0:	b0 01                	mov    al,0x1
   2:	c3                   	ret

Example 2 (from #39333): https://godbolt.org/z/oee13ennv. With my changes, this becomes

0000000000000000 <main>:
   0:	31 c0                	xor    eax,eax
   2:	c3                   	ret
fwolff marked an inline comment as done.Apr 12 2022, 8:27 AM
fwolff added inline comments.
libcxx/src/support/runtime/exception_pointer_cxxabi.ipp
17

There are two implementations of __incr/decr_refcount(), one here and one in exception_pointer_unimplemented.ipp. But maybe the latter one is unnecessary? I'm not sure.

Adding the move constructor and assignment operator isn't ABI-breaking AFAICT. IIUC the ABI breaking part of this patch is that the functions don't get generated anymore in the dylib. Using __attribute__((__gnu_inline__)) allows us to suppress generating the functions if they're not inlined, but allows the compiler to inline the function. Now if we don't apply the attribute when building the library the functions will be emitted, but otherwise they won't. This way we have the same linkage and so on for the dylib function, but have all the optimization opportunities of an inline function.

fwolff marked an inline comment as done.Apr 12 2022, 8:59 AM

Adding the move constructor and assignment operator isn't ABI-breaking AFAICT. IIUC the ABI breaking part of this patch is that the functions don't get generated anymore in the dylib. Using __attribute__((__gnu_inline__)) allows us to suppress generating the functions if they're not inlined, but allows the compiler to inline the function. Now if we don't apply the attribute when building the library the functions will be emitted, but otherwise they won't. This way we have the same linkage and so on for the dylib function, but have all the optimization opportunities of an inline function.

Yes, of course you're right about the move constructor and assignment operator, I had gotten this mixed up.

The GCC documentation says about gnu_inline that

The way to use this is to put a function definition in a header file with this attribute, and put another copy of the function, without extern, in a library file.

and

In C++, this attribute does not depend on extern in any way, but it still requires the inline keyword to enable its special behavior.

So I don't understand how this is supposed to work in C++. Let's consider a small example:
a.h:

#pragma once

struct A {
  int foo() { return 1; }
};

a.cc:

#include "a.h"

int A::foo() { return 1; }

Could you annotate this example with gnu_inline to illustrate your point? And, if you are going to use any #ifdefs, what should they check for in libc++? Thank you!

I think I'm abusing __gnu_inline__ here a bit, but hey:

First of all, you have to have both the declaration and the definition inside the header, because neither clang nor GCC are happy to have an out of line definition of an already defined function.
a.h:

#pragma once

#ifdef EMIT_FUNCTION
#  define GNU_INLINE
#else
#  define GNU_INLINE __attribute__((__gnu_inline__)) inline
#endif

struct A {
  GNU_INLINE int foo();
};

// Defining the function this way forces the compiler to emit the function when it's not marked inline,
// but if it's maked '__attribute__((__gnu_inline__)) inline' it forces the compiler to _not_ emit the function.
GNU_INLINE int A::foo() {
  return 1;
}

a.cpp:

#define EMIT_FUNCTION
#include "a.h"

You have to use a different name and suppress a warning, but I left that away for simplicity.

fwolff updated this revision to Diff 422294.Apr 12 2022, 10:55 AM

Thanks for your help @philnik. I have now revised my implementation so that the ABI break is no longer required.

Can you add some benchmarks to show how much improvements these changes bring?

I don't know what a sensible benchmark for this would look like, but have a look at the generated assembly:

Example 1 (from #44892): https://godbolt.org/z/YGq86cszM. With my changes, this compiles to just

0000000000000000 <_Z4testv>:
   0:	b0 01                	mov    al,0x1
   2:	c3                   	ret

Example 2 (from #39333): https://godbolt.org/z/oee13ennv. With my changes, this becomes

0000000000000000 <main>:
   0:	31 c0                	xor    eax,eax
   2:	c3                   	ret

That assembly looks convincing, thanks.

libcxx/include/exception
144

I would like some comment describing this _LIBCPP_EXCEPTION_PTR_GNU_INLINE magic. Reading the other comments in this review make it clear, could you add short summary of that information here?

145

We should validate whether the compiler supports this attribute.

183

Same at line 167.

fwolff updated this revision to Diff 423929.Apr 20 2022, 9:16 AM

Thanks for reviewing this @Mordante! I think I've addressed all of your comments, and I've added a suppression for -Wgnu-inline-cpp-without-extern, which will hopefully fix the failing tests.

fwolff marked 3 inline comments as done.Apr 20 2022, 9:19 AM
fwolff added inline comments.
libcxx/include/exception
145

The documentation seems to imply that either __GNUC_GNU_INLINE__ or __GNUC_STDC_INLINE__ will be defined if the __gnu_inline__ attribute is supported, so that's what I am now checking for. Let me know if you had something else in mind.

fwolff marked an inline comment as done.Apr 20 2022, 9:19 AM
philnik added inline comments.Apr 20 2022, 9:29 AM
libcxx/include/exception
150

I think you mean the __gnu_inline__ attribute here.

153

Do you mean _LIBCPP_EXCEPTION_PTR_GNU_INLINE here? If yes, please just put it here. It's a bit confusing otherwise.

154–155

__gnu_inline__ doesn't force the functions to be inlined. It disables the generation of the functions.

158

You can use __has_attribute(__gnu_inline__) to check if the compiler supports it. That also works for any other attribute.

240

I don't think that the comment is necessary here, but I also don't object.

242–244
libcxx/src/exception.cpp
9–11

Why do you guard the macro definition here? Could you add a short comment here why you are defining this?

Also please update the description.

fwolff edited the summary of this revision. (Show Details)Apr 20 2022, 4:28 PM
fwolff updated this revision to Diff 424057.Apr 20 2022, 4:48 PM
fwolff marked 6 inline comments as done.Apr 20 2022, 4:52 PM
fwolff added inline comments.
libcxx/src/exception.cpp
9–11

I think this was wrong; the macro definition can be unguarded and we unfortunately need more preprocessor guards in the exception header, because the glibcxx ABI doesn't have __incr/decr_refcount()...

fwolff marked an inline comment as done.Apr 20 2022, 4:52 PM

Are there still any remaining issues with this change?

As an user of std::exception_ptr (mostly for async callbacks and coroutines), I would love to see this land in LLVM 15 :)

Are there still any remaining issues with this change?

As an user of std::exception_ptr (mostly for async callbacks and coroutines), I would love to see this land in LLVM 15 :)

The latest patch didn't pass our pre-commit CI. So that needs to be fixed and it would be good to rebase the patch against main.

fwolff updated this revision to Diff 442409.Jul 5 2022, 4:29 PM

@Mordante I think the main problem were the inconsistent abilist files, which I have attempted to fix here (unfortunately generate-cxx-abilist didn't really work for me, as it created lots of other entries for things I didn't touch, so I had to do it manually). I've also rebased the patch, though the rebase went cleanly.

I think the main problem were the inconsistent abilist files, which I have attempted to fix here

This problem does indeed appear to be fixed now, judging by the latest CI run. But there are now some problems on Mac OS, which I cannot reproduce because I don't have such a machine. Perhaps someone else can have a look at this?

I think the main problem were the inconsistent abilist files, which I have attempted to fix here

This problem does indeed appear to be fixed now, judging by the latest CI run. But there are now some problems on Mac OS, which I cannot reproduce because I don't have such a machine. Perhaps someone else can have a look at this?

The problem is that you have introduced new symbols to the dylib. I think the easiest way to avoid this is to use __cxa_{increment, decrement}_exception_refcount in the header directly.

The problem is that you have introduced new symbols to the dylib. I think the easiest way to avoid this is to use __cxa_{increment, decrement}_exception_refcount in the header directly.

How is this supposed to work? Including cxxabi.h in the header breaks everything, and nowhere else in libcxx/include/ do any __cxa_* functions get called. Also, there must be a way to distinguish exception_pointer_cxxabi.ipp from exception_pointer_unimplemented.ipp in the header if we're not going to introduce any additional symbols; how should I go about doing this?

The problem is that you have introduced new symbols to the dylib. I think the easiest way to avoid this is to use __cxa_{increment, decrement}_exception_refcount in the header directly.

How is this supposed to work? Including cxxabi.h in the header breaks everything, and nowhere else in libcxx/include/ do any __cxa_* functions get called. Also, there must be a way to distinguish exception_pointer_cxxabi.ipp from exception_pointer_unimplemented.ipp in the header if we're not going to introduce any additional symbols; how should I go about doing this?

I think you should only enable the optimization for known-good C++ABIs (so only libc++abi currently) and to forward-declare the __cxa_ functions. That means the default still is calling the functions in exception_pointer_unimplemented.ipp.

fwolff updated this revision to Diff 442726.Jul 6 2022, 5:11 PM

Thanks for your help @philnik. I'm still unsure what exactly I should check for in the exception header; I went for

#if defined(_LIBCPP_ABI_VERSION) || defined(LIBCXXRT) || defined(LIBCXX_BUILDING_LIBCXXABI)

because the first two are used in libcxx/src/exception.cpp to indicate that the exception_pointer_cxxabi.ipp implementation is to be used, and it didn't work without the last one, but it doesn't really look right to me, so please comment.

Thanks for your help @philnik. I'm still unsure what exactly I should check for in the exception header; I went for

#if defined(_LIBCPP_ABI_VERSION) || defined(LIBCXXRT) || defined(LIBCXX_BUILDING_LIBCXXABI)

because the first two are used in libcxx/src/exception.cpp to indicate that the exception_pointer_cxxabi.ipp implementation is to be used, and it didn't work without the last one, but it doesn't really look right to me, so please comment.

I'm not actually sure what the right is here. I think @ldionne should take a look, as he is more familiar with these things.