This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Reduces std::to_chars instantiations.
ClosedPublic

Authored by Mordante on Jun 20 2022, 9:38 AM.

Details

Reviewers
ldionne
Group Reviewers
Restricted Project
Commits
rGcf927669eba9: [libc++] Reduces std::to_chars instantiations.
Summary

Instead of instantiating all functions called by std::to_chars for the
integral types only instantiate them for 32 and 64 bit integral types.
This results in a smaller binary when using different types.

In an example using the types: signed char, short, int, long, long long,
unsigned char, unsigned short, unsigned int, unsigned long, and
unsigned long long this saved 2792 bytes of code size. For libc++.so.1
is saves 688 bytes of code size (64-bit Linux).

This was discovered while investigating a solution for #52709.

Diff Detail

Event Timeline

Mordante created this revision.Jun 20 2022, 9:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2022, 9:38 AM
Mordante requested review of this revision.Jun 20 2022, 9:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2022, 9:38 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Mordante added inline comments.Jun 20 2022, 9:39 AM
libcxx/include/type_traits
1025

Note I'm not fond of this name, so I'm open to suggestions for a better name.

Is there a reason to not just instantiate it for that largest integral type? Or size_t and the largest integral type to avoid using multiple registers for smaller types?

libcxx/include/charconv
492
501

Why _Uglify it? You can just use type. Otherwise I would use _Type to make it clear that it's a type.

503
libcxx/include/type_traits
1024–1045

Could you put this into it's own header?

1025

Maybe __extend_to_32_64_or_128_bit_t? It's also not exactly perfect, but I think it describes it a bit better.

Is there a reason to not just instantiate it for that largest integral type? Or size_t and the largest integral type to avoid using multiple registers for smaller types?

Yes we still support 32-bit platforms, using a 64-bit value there would give a performance penalty, especially on platforms with a small number or registers. Another reason why I prefer 32-bits is that I'm quite sure that division of a 32-bit value by a constant is more efficient than the division of a 64-bit value. (The compiler will transform it in a multiplication and a shift which IIRC always fits in a 64-bit register, when using a 32-bit value.) This is what we're doing for base 10.

libcxx/include/type_traits
1024–1045

Yes I think that makes sense, but I first want to settle on a good name.

Yes we still support 32-bit platforms, using a 64-bit value there would give a performance penalty, especially on platforms with a small number or registers.

Isn't size_t 32 bits wide on 32-bit platforms?

Another reason why I prefer 32-bits is that I'm quite sure that division of a 32-bit value by a constant is more efficient than the division of a 64-bit value. (The compiler will transform it in a multiplication and a shift which IIRC always fits in a 64-bit register, when using a 32-bit value.) This is what we're doing for base 10.

Could you do a benchmark to check if using a 32-bit wide instantiation yields better performance even on 64-bit platforms?

ldionne accepted this revision.Jun 20 2022, 1:04 PM
ldionne added a subscriber: ldionne.
ldionne added inline comments.
libcxx/include/charconv
491

I think you need parentheses here.

501

While it's true that type is technically not a name that users can macro-ize (otherwise the world would break), it's still good to uglify our internal names for consistency. I don't care strongly about __type or _Type, but I do have a small preference for _Type. What I care strongly about is that we do uglify the name.

The only places were we use type should be where the Standard mandates that we have such a user-facing name in our API (in the type traits). I know in practice we sometimes use it for internal-only things because it's convenient, though.

libcxx/include/type_traits
1024

I would suggest this instead (the reformatting would make it easier to read IMO):

template <bool _Cond, class _If, class _Else>
using __conditional_t = typename conditional<_Cond, _If, _Else>::type;

template <class _Tp>
using __make_32_64_or_128_bit_t = __copy_unsigned_t<_Tp,
  __conditional_t<sizeof(_Tp) <= sizeof(int32_t),   int32_t,
  __conditional_t<sizeof(_Tp) <= sizeof(int64_t),   int64_t,
  __conditional_t<sizeof(_Tp) <= sizeof(__int128_t), __int128_t,
  /* else */                                         void
  >>>
>;

Here, __copy_unsigned_t would be basically

template <class _Tp, class _Up>
using __copy_unsigned_t = typename conditional<is_unsigned<_Tp>::value, make_unsigned_t<_Up>, _Up>;
1025

Just throwing another suggestion out there, but perhaps something like __widen_up_to_128_t would be reasonable?

This revision is now accepted and ready to land.Jun 20 2022, 1:04 PM
Mordante marked 6 inline comments as done.Jun 21 2022, 10:49 AM
Mordante added inline comments.
libcxx/include/charconv
491

For simplicity I just split it into two asserts, that makes removing 128-bit one easier.

libcxx/include/type_traits
1024–1045

After the rewrite to the suggestion above that's no longer an option; it uses make_unsigned. Do you have patches in progress to move that? If not I can make a followup patch to move make_unsigned and this new helper.

Mordante updated this revision to Diff 438762.Jun 21 2022, 10:53 AM
Mordante marked an inline comment as done.

Rebased and adresses review comments.

philnik added inline comments.Jun 21 2022, 11:06 AM
libcxx/include/type_traits
1024–1045

I don't have a patch for that currently. You can just leave the new trait here and I'll move it once I get around to granularizing <type_traits> further.

Mordante updated this revision to Diff 438775.Jun 21 2022, 11:20 AM

tab -> space

Mordante updated this revision to Diff 438783.Jun 21 2022, 11:43 AM

Fixes C++03.

Yes we still support 32-bit platforms, using a 64-bit value there would give a performance penalty, especially on platforms with a small number or registers.

Isn't size_t 32 bits wide on 32-bit platforms?

Another reason why I prefer 32-bits is that I'm quite sure that division of a 32-bit value by a constant is more efficient than the division of a 64-bit value. (The compiler will transform it in a multiplication and a shift which IIRC always fits in a 64-bit register, when using a 32-bit value.) This is what we're doing for base 10.

Could you do a benchmark to check if using a 32-bit wide instantiation yields better performance even on 64-bit platforms?

Actually I was wrong, for base 10 64-bit it's about as efficient. I test whether a 64-bit value fits in a 32-bit value and directly dispatch to a 32-bit value. So I guess that's why there's no penalty. But for the other bases there's a penalty. Several bases that were 25 ns before now are 30 ns. I haven't measured on 32-bit systems, but there I expect the penalty to be worse. So I will keep the 32 and 64 bit paths separated.

Mordante updated this revision to Diff 439031.Jun 22 2022, 8:27 AM
Mordante marked 4 inline comments as done.

Final polishing, when the CI passes I'll land this version.

Mordante added inline comments.Jun 22 2022, 8:58 AM
libcxx/include/type_traits
1025

Based on the suggestions I keep the original name, but thanks for them!

This revision was automatically updated to reflect the committed changes.