This is an archive of the discontinued LLVM Phabricator instance.

[libc++][C++17] Elementary string conversions for integral types
ClosedPublic

Authored by lichray on Dec 20 2017, 11:19 AM.

Details

Summary

Major QoI considerations:

  • The facility is backported to C++14, same as libstdc++.
  • Efforts have been made to minimize the header dependencies.
  • The design is friendly to the uses of MSVC intrinsics (__emulu, _umul128, _BitScanForward, _BitScanForward64) but not implemented; future contributions are welcome.

Thanks to Milo Yip for contributing the implementation of __u64toa and __u32toa.

References:
https://wg21.link/p0067r5
https://wg21.link/p0682r1

Event Timeline

lichray created this revision.Dec 20 2017, 11:19 AM

I've got an implementation for this, too - at https://github.com/mclow/snippets/blob/master/to_chars.cpp

I'll compare them.

lichray updated this revision to Diff 127996.Dec 22 2017, 2:46 AM

Added std::from_chars

lichray retitled this revision from WIP: [libc++][C++17] Elementary string conversions for integral types to [libc++][C++17] Elementary string conversions for integral types.Dec 22 2017, 2:58 AM
lichray edited the summary of this revision. (Show Details)
lichray updated this revision to Diff 128071.Dec 23 2017, 12:39 AM

Use Marshall's method to generate digits in reversed order in generic to_chars.

lichray updated this revision to Diff 128393.Jan 1 2018, 1:26 PM

Just include math.h

lichray updated this revision to Diff 129714.Jan 12 2018, 2:55 PM

src/support/itoa/itoa.cpp in previous diffs were copyrighted by Tencent, now LLVM, contributed by the same author.

lichray edited the summary of this revision. (Show Details)Jan 12 2018, 2:59 PM
lichray edited the summary of this revision. (Show Details)
EricWF added a comment.Feb 9 2018, 8:31 PM

Some initial thoughts.

@mclow.lists Are you planning on moving forward with your implementation as well?

include/charconv
90

We need to hide these names when _LIBCPP_STD_VER < 17, since we're not allowed to introduce new names into namespace std in older dialects.

91

enum types should have their underlying integer type explicitly specified. Otherwise their size and ABI can break when users specify -short-enums

115

No reason for using trailing return type syntax here.

123

Same as above. There's no reason to deviate from the typical libc++ style and use trailing return types here.

152

<algorithm> already has a __clz wrapper for __builtin_clz et al. We should use that instead. That also allows us to get rid of the fallback implementation, and it correctly uses the builtin for compilers like GCC which don't provide __has_builtin.

include/support/itoa/itoa.h
1 ↗(On Diff #129714)

I would rather not introduce another header for this. I think it should go directly in the charconv header.

24 ↗(On Diff #129714)

I'm not sure I love having static globals in the headers, but I can't think of a better way to write this.

29 ↗(On Diff #129714)

The UINT64_C and UINT32_C macros are non-standard, so I would rather not use them in a header.

Additionally, the compiler should correctly convert the integer type to the array's element type, no?

47 ↗(On Diff #129714)

I suspect we can use __pow10_64 in the 32 bit case as well to avoid emitting another static global.

54 ↗(On Diff #129714)

Use __clz from <algorithm>. You can assume we always have that.

57 ↗(On Diff #129714)

__u64digits10 and __u32digits10 can be folded into width, since it's the only caller.

72 ↗(On Diff #129714)

We don't marke functions in the dylib with extern.

76 ↗(On Diff #129714)

Maybe a more descriptive name than __traits?

81 ↗(On Diff #129714)

width, convert and pow need to be reserved names.

123 ↗(On Diff #129714)

__builtin_mul_overflow works for char and short, so can't we just call that?

144 ↗(On Diff #129714)

GCC provides these builtins but __has_builtin won't detect them.

We can probably safely assume that every compiler except _LIBCPP_COMPILER_MSVC provides them, and only work around MSVC.

161 ↗(On Diff #129714)

traits needs to be a reserved name, and preferably a more descriptive one.

188 ↗(On Diff #129714)

What's wrong with using std::inner_product?

src/support/itoa/itoa.cpp
1 ↗(On Diff #129714)

This file should be renamed charconv.cpp and moved into the main source directory.

19 ↗(On Diff #129714)

This should be constexpr to ensure correct initialization.

35 ↗(On Diff #129714)

Any reason these can't be static functions? The compiler should optimize them away nicely.

test/std/utilities/charconv/charconv.from.chars/integral.pass.cpp
57

LIBCPP_ASSERT is only intended to test non-standard behavior that libc++ happens to have but that isn't required by the standard.

If this behavior is specified by the standard it should be tested using assert instead.

EricWF added a comment.Feb 9 2018, 8:35 PM

@lichray I should have mentioned: Although this header is C++17 only, the bits compiled into the dylib need to compile as C++11 still.

I will pick up the changes later next week.

include/charconv
90

But this header is backported to C++11, so I intended to not to guard it.

123

libc++ doesn't have one uniformed style. For some reason I found this style formats better with clang-format.

152

I saw that, and I agree this can be improved, however <algorithm> would be too heavy to include here. Thoughts?

include/support/itoa/itoa.h
81 ↗(On Diff #129714)

These names are standard names (functions), so users can't provide function style macros to override them. Am I wrong on that?

123 ↗(On Diff #129714)

I don't think so, it "works" after promotion.

161 ↗(On Diff #129714)

Similarly It's a standard name (std::string typedef), so user can't use a non-function style macro name to override it, IIUC.

188 ↗(On Diff #129714)

__first != __last1 instead of <

src/support/itoa/itoa.cpp
1 ↗(On Diff #129714)

We are going to have floating point cpp files so I don't think that one charconv.cpp is enough.

35 ↗(On Diff #129714)

Although yes, but that's what the author provides. It's an implementation file, so it doesn't matter I guess.

lichray marked 7 inline comments as done.Mar 11 2018, 7:44 PM

Pending patch update due to poor network.

include/charconv
91

The standard requires enum class to default to int; IIUC enum class is not affected by -fshort-enums.

include/support/itoa/itoa.h
1 ↗(On Diff #129714)

We may have separated floating point headers coming as well...

47 ↗(On Diff #129714)

I have functions returning them as array references.

54 ↗(On Diff #129714)

Factor __clz out of algorithm may be too much for this review. I would like to see a subsequent patch to take care of all the related portability fixes.

lichray marked an inline comment as not done.Mar 17 2018, 10:04 PM
lichray added inline comments.
include/support/itoa/itoa.h
29 ↗(On Diff #129714)

They are from C standard, and I included the corresponding C header. Dropping them gave me -Wimplicitly-unsigned-literal warning.

lichray updated this revision to Diff 138841.Mar 17 2018, 10:05 PM
lichray marked an inline comment as not done.

Addressing 'Done' comments

lichray updated this revision to Diff 138842.Mar 17 2018, 10:07 PM

Keep patches split

Sorry I've let this lie fallow for so long.

include/charconv
235

Why use the trailing return type here?
I don't see any advantage - it doesn't depend on the parameters (template or runtime).

src/support/itoa/itoa.cpp
1 ↗(On Diff #129714)

We can put both integral and floating point routines in the same source file ;-)

What Eric said - there should be just charconv.cpp, and no subdirectory.

Also, if this is not your work, then I need some notice (an email is fine) by the author saying that they're OK with contributing this under the LLVM license.

35 ↗(On Diff #129714)

It *does* matter, since we'll have to maintain this.

It would also be nice if they had meaningful names.

lichray marked 3 inline comments as done.Mar 22 2018, 2:58 AM
lichray added inline comments.
include/charconv
235

Because clang-format doesn't distinguish storage specifiers and simple-type-specifiers, and I want to format return types along with function signatures rather than letting hanging somewhere.

src/support/itoa/itoa.cpp
35 ↗(On Diff #129714)

I tried (template + forceinline), and the binary was optimized differently (emitted ~100 more instructions). The author have tuned these really well. A syntax like buffer = append1(buffer, v) may work better, I guess, but that adds lots of noises.

These macros are not complex. They just emit a integer of width 1234 to digits.

lichray updated this revision to Diff 139422.Mar 22 2018, 2:58 AM

Reorganize files

lichray updated this revision to Diff 141110.Apr 5 2018, 12:05 AM

Update module map

mclow.lists added inline comments.May 15 2018, 10:37 AM
include/charconv
90

But this header is backported to C++11, so I intended to not to guard it.

In general, we don't provide new features for old language versions.

The only time we've ever done that is for string_view, and I'm still not sure I did the right thing there.

src/charconv.cpp
35

I'd really like to see some numbers here showing how this (somewhat awkward) code performs better than the straightforward versions.

Because maintenance is forever.

Note: I'm sure that this was true on older compilers - but is it true today?

test/support/charconv_test_helpers.h
25

If this is supposed to be a C++17 or later header (and I'm pretty sure it is), you should add a static_assert(TEST_STD_VER > 14, ""); to this header.

lichray added inline comments.May 22 2018, 5:56 AM
include/charconv
90

We need to decide on this... From my point of view this header will be widely used by formatting and logging libraries, and it doesn't add much to the community by enforcing C++17 here, especially when the interface we specified are very simple and not using any features beyond C++11.

src/charconv.cpp
35

One thing I could try may be using return values rather than output parameters, like

buffer = append3(buffer, b);

But I really don't think doing these makes the code any bit more maintainable comparing to

APPEND3(b);

, which looks very straightforward and expressive to me...

Quuxplusone added inline comments.
include/charconv
90

This question is also relevant to my interests, in re <memory_resource>.

lichray updated this revision to Diff 153868.Jul 3 2018, 12:54 AM

A macro-free implementation

lichray marked an inline comment as done.Jul 3 2018, 1:00 AM
lichray added inline comments.
src/charconv.cpp
35

OK, after several attempts, I figured out that by eliminating side-effects to the buffer pointer in the helper functions, Clang now generates code of the same quality, and runs statistically faster.

lichray updated this revision to Diff 153888.Jul 3 2018, 3:06 AM

Install the header file

lichray added inline comments.Jul 9 2018, 2:27 PM
test/support/charconv_test_helpers.h
25

I can run it with --param=std=c++11.

mclow.lists added inline comments.Jul 9 2018, 8:48 PM
.gitignore
7

I'm pretty sure this file doesn't belong in this diff.

include/charconv
90

From my point of view this header will be widely used by formatting and logging libraries,

Non-portable formatting and logging libraries - if we provide it before C++17 and they use it.

152

We could make this depend on <bit> once P0553 is adopted. (use countl_zero)

My intention there is to provide __countl_zero etc for internal use before C++20, and the public names for C++20 and later. That approach could be used here as well, if we choose.

235

That's an argument for fixing clang-format, not for writing the code this way.

241

I'm pretty sure we can't declare a type named traits - even inside a namespace.

test/std/utilities/charconv/charconv.from.chars/integral.pass.cpp
134

Rather than attempting to reuse bits of the string here, I would define a struct:

struct test_data {
    const char *input;
    const char *output;
    std::errc err;
    T value;
    };

and then write a bunch of test cases and iterate over them.

lichray marked 3 inline comments as done.Jul 10 2018, 7:27 AM
lichray added inline comments.
include/charconv
90

When they use it, what's next?

  1. Someone try to use the library against libstdc++, he/she
    • file a bug report to the library, or
    • file a bug report to libstdc++
  2. The library has too little users, so it's okay for it to be non-portable.

Looks good to me.

152

OK.

test/std/utilities/charconv/charconv.from.chars/integral.pass.cpp
134

No longer reusing parts from the string.

lichray updated this revision to Diff 154804.Jul 10 2018, 7:29 AM
lichray marked an inline comment as done.

Respond to the 2nd round review

lichray updated this revision to Diff 154816.Jul 10 2018, 8:38 AM

Less trailing return types

lichray marked an inline comment as done.Jul 10 2018, 8:53 AM
lichray added inline comments.
include/charconv
235

Yes... I did some research on ClangFormat, until... Anyway, I moved most of the trailing return types to the front, leaving two which are too complex in the header, and a few necessary ones in the tests.

lichray updated this revision to Diff 154842.Jul 10 2018, 10:38 AM

Dropping C++11 support.

lichray marked an inline comment as done.Jul 10 2018, 10:41 AM
lichray added inline comments.
include/charconv
90

Back-ported to C++14, in-line with libstdc++.

lichray edited the summary of this revision. (Show Details)Jul 10 2018, 10:43 AM
mclow.lists added inline comments.Jul 16 2018, 8:56 AM
include/charconv
373

Are you missing an edge case here? What happens if __last == __first && __value == 0?

test/support/charconv_test_helpers.h
41

We don't need to use ugly names here in the test suite.

mclow.lists added inline comments.Jul 16 2018, 9:01 AM
include/charconv
245

Same comment as above about read and inner_product - they need to be "ugly names"

359

I just want you to reassure me here - this lambda gets inlined, right?

mclow.lists added inline comments.Jul 16 2018, 9:03 AM
include/charconv
360

Thinking some more - did this used to do more? Because I don't see why having a lambda here is a benefit, as compared to:

const char *__digits = "0123456789abcdefghijklmnopqrstuvwxyz";

and

*--p = digits[__c];
lichray added inline comments.Jul 16 2018, 9:15 AM
include/charconv
245

Unlike traits which is a template parameter name in the standard, read and inner_product are function names in the standard, which means the users cannot make a macro for them (and there is no guarantee about what name you make not get by including certain headers), so we don't need to use ugly names here, am I right?

359

Yes -- I read my code in assembly.

360

I use a lambda here because it may do more in the future. If someone wants to let it support non-ASCII platforms, then they only need to make a patch against this lambda rather than changing the sites of uses. After all, there is nothing wrong to abstract out anything into a function, I think...

mclow.lists added inline comments.Jul 16 2018, 10:06 AM
include/charconv
373

Nevermind. Apparently I can't look two lines down.

mclow.lists added inline comments.Jul 16 2018, 1:29 PM
include/charconv
245

I understand your reasoning, but I don't agree.

Just last month, I had to rename a function in vector from allocate to __vallocate because it confused our "is this an allocator" detection. The function in question was private, so it shouldn't have mattered, but GCC has a bug where sometimes it partially ignores access restrictions in non-deduced contexts, and then throws a hard error when it comes back to a different context. The easiest workaround was to rename the function in vector.

Since then, I've been leery of public names that match others. This is pretty obscure, since it's in a private namespace, but I'd feel better if they were __read and __inner_product.

include/charconv
245

FWIW, +1 to ugly names. Even if the un-ugly code is "technically not broken yet", and besides the technical reason Marshall gives,
(1) it's nice that libc++ has style rules and sticks to them, precisely to *avoid* bikeshedding the name of every private member in the world;
(2) just because users can't #define read write doesn't mean they *won't* do it. I would actually be extremely surprised if read were *not* defined as a macro somewhere inside <windows.h>. :)

See also: "should this function call be _VSTD::-qualified?" Sometimes the answer is technically "no", but stylistically "yes", precisely to indicate that we *don't* intend for it to be an ADL customization point. Consistent style leads to maintainability.

lichray updated this revision to Diff 155780.Jul 16 2018, 4:03 PM

Uglify all the names

lichray marked 2 inline comments as done.Jul 16 2018, 4:12 PM
lichray added inline comments.
include/charconv
245

read is a function decl in <io.h>, not redefined in other forms in <windows.h>.

test/support/charconv_test_helpers.h
41

Err, it's not? Just an impl version of fits_in (without the _ prefix).

lichray marked an inline comment as done.Jul 16 2018, 4:13 PM
lichray edited the summary of this revision. (Show Details)Jul 18 2018, 12:53 PM

Ping. Any more comments?

Getting close here. I'll have a couple more comments later today, so don't post a new diff quite yet.

include/charconv
159

In general, we don't put _LIBCPP_COMPILER_XXX in header files in libc++.
Instead, we declare a feature macro _LIBCPP_HAS_XXXX in <__config> and then use that everywhere.

I see that most of the uses of _LIBCPP_COMPILER_MSVC are in <algorithm>, and center around the builtins for __builtin_clz, etc.

We can clean this up later. (/me makes note).

245

See also: "should this function call be _VSTD::-qualified

We already have an inner_product ( in <algorithm>), so I think not.

We _VSTD:: qualify move and forward always. min/max due mostly to windows macros. swap when need to disable ADL.

Other things we're inconsistent about: declval sometimes (I don't know why), use_facet, addressof, to_raw_pointer.

I'll figure out when we should mark things with _VSTD and write it up.

286

in libc++, we put the return type on the left.

template <typename _Tp>
inline _LIBCPP_INLINE_VISIBILITY
auto __to_unsigned(_Tp __x)

(and then ask 'why auto'?)

360

I don't think that making it a lambda today buys us anything. Since it is local, we can make it a lambda again in the future if we need to w/o breaking anyone.

Shades of YAGNI.

test/support/charconv_test_helpers.h
227

This name run concerns me. I'd feel better if it was run_integral_tests or something a bit less general.

On the other hand, it's in a test, so we can change it whenever we want.

lichray marked an inline comment as done.Jul 30 2018, 2:01 PM
lichray added inline comments.
include/charconv
159

MSVC has the intrinsics we need, but I did not plan to put too much in this patch. The suggested <bits> header sounds like a good place for grouping up those intrinsics in the future.

286
template <typename _Tp>
inline _LIBCPP_INLINE_VISIBILITY
auto __to_unsigned(_Tp __x)

I want this as well, but ClangFormat doesn't support it... The style which I'm using is close to FreeBSD style(9), please forgive. I don't mind if you send me a .clang-format file (the one I used is http://paste.openstack.org/show/726883/), but I won't manually adjust the code...

About why auto -- this file requires C++14 already, so I don't need to repeat my self (by writing make_unsigned_t<_Tp> again).

test/support/charconv_test_helpers.h
227

The callsites look like

run<test_signed>(all_signed);

That's why I named it "run". Not an issue for now I think.

Quuxplusone added a comment.EditedJul 30 2018, 8:53 PM

@lichray: I'm interested in merging this patch into my libc++ fork, but the latest update seems to be missing a ton of files. For example charconv_test_helpers.h is included by one of the tests, but does not exist. Also there's a lot of boilerplate missing: you ought to add the new header to module.modulemap, and to double_include.sh.cpp, and so on. Might you upload a new patch anytime soon?

EDIT: Weird, the diff is different now. Must have been some kind of race condition. Nvm.

lichray updated this revision to Diff 158404.Jul 31 2018, 3:28 PM

Dropping a lambda

mclow.lists accepted this revision.Jul 31 2018, 4:24 PM

LGTM. Thanks!

This revision is now accepted and ready to land.Jul 31 2018, 4:24 PM
This revision was automatically updated to reflect the committed changes.