Page MenuHomePhabricator

[libc++][C++17] Elementary string conversions for integral types
ClosedPublic

Authored by lichray on Dec 20 2017, 11:19 AM.

Details

Summary

Major QoI considerations:

  • The facility is backported to C++14, same as libstdc++.
  • Efforts have been made to minimize the header dependencies.
  • The design is friendly to the uses of MSVC intrinsics (__emulu, _umul128, _BitScanForward, _BitScanForward64) but not implemented; future contributions are welcome.

Thanks to Milo Yip for contributing the implementation of __u64toa and __u32toa.

References:
https://wg21.link/p0067r5
https://wg21.link/p0682r1

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Addressing 'Done' comments

lichray updated this revision to Diff 138842.Mar 17 2018, 10:07 PM

Keep patches split

Sorry I've let this lie fallow for so long.

include/charconv
235

Why use the trailing return type here?
I don't see any advantage - it doesn't depend on the parameters (template or runtime).

src/support/itoa/itoa.cpp
1 ↗(On Diff #129714)

We can put both integral and floating point routines in the same source file ;-)

What Eric said - there should be just charconv.cpp, and no subdirectory.

Also, if this is not your work, then I need some notice (an email is fine) by the author saying that they're OK with contributing this under the LLVM license.

35 ↗(On Diff #129714)

It *does* matter, since we'll have to maintain this.

It would also be nice if they had meaningful names.

lichray marked 3 inline comments as done.Mar 22 2018, 2:58 AM
lichray added inline comments.
include/charconv
235

Because clang-format doesn't distinguish storage specifiers and simple-type-specifiers, and I want to format return types along with function signatures rather than letting hanging somewhere.

src/support/itoa/itoa.cpp
35 ↗(On Diff #129714)

I tried (template + forceinline), and the binary was optimized differently (emitted ~100 more instructions). The author have tuned these really well. A syntax like buffer = append1(buffer, v) may work better, I guess, but that adds lots of noises.

These macros are not complex. They just emit a integer of width 1234 to digits.

lichray updated this revision to Diff 139422.Mar 22 2018, 2:58 AM

Reorganize files

lichray updated this revision to Diff 141110.Apr 5 2018, 12:05 AM

Update module map

mclow.lists added inline comments.May 15 2018, 10:37 AM
include/charconv
90

But this header is backported to C++11, so I intended to not to guard it.

In general, we don't provide new features for old language versions.

The only time we've ever done that is for string_view, and I'm still not sure I did the right thing there.

src/charconv.cpp
35

I'd really like to see some numbers here showing how this (somewhat awkward) code performs better than the straightforward versions.

Because maintenance is forever.

Note: I'm sure that this was true on older compilers - but is it true today?

test/support/charconv_test_helpers.h
25

If this is supposed to be a C++17 or later header (and I'm pretty sure it is), you should add a static_assert(TEST_STD_VER > 14, ""); to this header.

lichray added inline comments.May 22 2018, 5:56 AM
include/charconv
90

We need to decide on this... From my point of view this header will be widely used by formatting and logging libraries, and it doesn't add much to the community by enforcing C++17 here, especially when the interface we specified are very simple and not using any features beyond C++11.

src/charconv.cpp
35

One thing I could try may be using return values rather than output parameters, like

buffer = append3(buffer, b);

But I really don't think doing these makes the code any bit more maintainable comparing to

APPEND3(b);

, which looks very straightforward and expressive to me...

Quuxplusone added inline comments.
include/charconv
90

This question is also relevant to my interests, in re <memory_resource>.

lichray updated this revision to Diff 153868.Jul 3 2018, 12:54 AM

A macro-free implementation

lichray marked an inline comment as done.Jul 3 2018, 1:00 AM
lichray added inline comments.
src/charconv.cpp
35

OK, after several attempts, I figured out that by eliminating side-effects to the buffer pointer in the helper functions, Clang now generates code of the same quality, and runs statistically faster.

lichray updated this revision to Diff 153888.Jul 3 2018, 3:06 AM

Install the header file

lichray added inline comments.Jul 9 2018, 2:27 PM
test/support/charconv_test_helpers.h
25

I can run it with --param=std=c++11.

mclow.lists added inline comments.Jul 9 2018, 8:48 PM
.gitignore
7 ↗(On Diff #129714)

I'm pretty sure this file doesn't belong in this diff.

include/charconv
90

From my point of view this header will be widely used by formatting and logging libraries,

Non-portable formatting and logging libraries - if we provide it before C++17 and they use it.

152

We could make this depend on <bit> once P0553 is adopted. (use countl_zero)

My intention there is to provide __countl_zero etc for internal use before C++20, and the public names for C++20 and later. That approach could be used here as well, if we choose.

235

That's an argument for fixing clang-format, not for writing the code this way.

241

I'm pretty sure we can't declare a type named traits - even inside a namespace.

test/std/utilities/charconv/charconv.from.chars/integral.pass.cpp
134

Rather than attempting to reuse bits of the string here, I would define a struct:

struct test_data {
    const char *input;
    const char *output;
    std::errc err;
    T value;
    };

and then write a bunch of test cases and iterate over them.

lichray marked 3 inline comments as done.Jul 10 2018, 7:27 AM
lichray added inline comments.
include/charconv
90

When they use it, what's next?

  1. Someone try to use the library against libstdc++, he/she
    • file a bug report to the library, or
    • file a bug report to libstdc++
  2. The library has too little users, so it's okay for it to be non-portable.

Looks good to me.

152

OK.

test/std/utilities/charconv/charconv.from.chars/integral.pass.cpp
134

No longer reusing parts from the string.

lichray updated this revision to Diff 154804.Jul 10 2018, 7:29 AM
lichray marked an inline comment as done.

Respond to the 2nd round review

lichray updated this revision to Diff 154816.Jul 10 2018, 8:38 AM

Less trailing return types

lichray marked an inline comment as done.Jul 10 2018, 8:53 AM
lichray added inline comments.
include/charconv
235

Yes... I did some research on ClangFormat, until... Anyway, I moved most of the trailing return types to the front, leaving two which are too complex in the header, and a few necessary ones in the tests.

lichray updated this revision to Diff 154842.Jul 10 2018, 10:38 AM

Dropping C++11 support.

lichray marked an inline comment as done.Jul 10 2018, 10:41 AM
lichray added inline comments.
include/charconv
90

Back-ported to C++14, in-line with libstdc++.

lichray edited the summary of this revision. (Show Details)Jul 10 2018, 10:43 AM
mclow.lists added inline comments.Jul 16 2018, 8:56 AM
include/charconv
373

Are you missing an edge case here? What happens if __last == __first && __value == 0?

test/support/charconv_test_helpers.h
41

We don't need to use ugly names here in the test suite.

mclow.lists added inline comments.Jul 16 2018, 9:01 AM
include/charconv
245

Same comment as above about read and inner_product - they need to be "ugly names"

359

I just want you to reassure me here - this lambda gets inlined, right?

mclow.lists added inline comments.Jul 16 2018, 9:03 AM
include/charconv
360

Thinking some more - did this used to do more? Because I don't see why having a lambda here is a benefit, as compared to:

const char *__digits = "0123456789abcdefghijklmnopqrstuvwxyz";

and

*--p = digits[__c];
lichray added inline comments.Jul 16 2018, 9:15 AM
include/charconv
245

Unlike traits which is a template parameter name in the standard, read and inner_product are function names in the standard, which means the users cannot make a macro for them (and there is no guarantee about what name you make not get by including certain headers), so we don't need to use ugly names here, am I right?

359

Yes -- I read my code in assembly.

360

I use a lambda here because it may do more in the future. If someone wants to let it support non-ASCII platforms, then they only need to make a patch against this lambda rather than changing the sites of uses. After all, there is nothing wrong to abstract out anything into a function, I think...

mclow.lists added inline comments.Jul 16 2018, 10:06 AM
include/charconv
373

Nevermind. Apparently I can't look two lines down.

mclow.lists added inline comments.Jul 16 2018, 1:29 PM
include/charconv
245

I understand your reasoning, but I don't agree.

Just last month, I had to rename a function in vector from allocate to __vallocate because it confused our "is this an allocator" detection. The function in question was private, so it shouldn't have mattered, but GCC has a bug where sometimes it partially ignores access restrictions in non-deduced contexts, and then throws a hard error when it comes back to a different context. The easiest workaround was to rename the function in vector.

Since then, I've been leery of public names that match others. This is pretty obscure, since it's in a private namespace, but I'd feel better if they were __read and __inner_product.

Quuxplusone added inline comments.Jul 16 2018, 2:38 PM
include/charconv
245

FWIW, +1 to ugly names. Even if the un-ugly code is "technically not broken yet", and besides the technical reason Marshall gives,
(1) it's nice that libc++ has style rules and sticks to them, precisely to *avoid* bikeshedding the name of every private member in the world;
(2) just because users can't #define read write doesn't mean they *won't* do it. I would actually be extremely surprised if read were *not* defined as a macro somewhere inside <windows.h>. :)

See also: "should this function call be _VSTD::-qualified?" Sometimes the answer is technically "no", but stylistically "yes", precisely to indicate that we *don't* intend for it to be an ADL customization point. Consistent style leads to maintainability.

lichray updated this revision to Diff 155780.Jul 16 2018, 4:03 PM

Uglify all the names

lichray marked 2 inline comments as done.Jul 16 2018, 4:12 PM
lichray added inline comments.
include/charconv
245

read is a function decl in <io.h>, not redefined in other forms in <windows.h>.

test/support/charconv_test_helpers.h
41

Err, it's not? Just an impl version of fits_in (without the _ prefix).

lichray marked an inline comment as done.Jul 16 2018, 4:13 PM
lichray edited the summary of this revision. (Show Details)Jul 18 2018, 12:53 PM

Ping. Any more comments?

Getting close here. I'll have a couple more comments later today, so don't post a new diff quite yet.

include/charconv
159

In general, we don't put _LIBCPP_COMPILER_XXX in header files in libc++.
Instead, we declare a feature macro _LIBCPP_HAS_XXXX in <__config> and then use that everywhere.

I see that most of the uses of _LIBCPP_COMPILER_MSVC are in <algorithm>, and center around the builtins for __builtin_clz, etc.

We can clean this up later. (/me makes note).

245

See also: "should this function call be _VSTD::-qualified

We already have an inner_product ( in <algorithm>), so I think not.

We _VSTD:: qualify move and forward always. min/max due mostly to windows macros. swap when need to disable ADL.

Other things we're inconsistent about: declval sometimes (I don't know why), use_facet, addressof, to_raw_pointer.

I'll figure out when we should mark things with _VSTD and write it up.

286

in libc++, we put the return type on the left.

template <typename _Tp>
inline _LIBCPP_INLINE_VISIBILITY
auto __to_unsigned(_Tp __x)

(and then ask 'why auto'?)

360

I don't think that making it a lambda today buys us anything. Since it is local, we can make it a lambda again in the future if we need to w/o breaking anyone.

Shades of YAGNI.

test/support/charconv_test_helpers.h
227

This name run concerns me. I'd feel better if it was run_integral_tests or something a bit less general.

On the other hand, it's in a test, so we can change it whenever we want.

lichray marked an inline comment as done.Jul 30 2018, 2:01 PM
lichray added inline comments.
include/charconv
159

MSVC has the intrinsics we need, but I did not plan to put too much in this patch. The suggested <bits> header sounds like a good place for grouping up those intrinsics in the future.

286
template <typename _Tp>
inline _LIBCPP_INLINE_VISIBILITY
auto __to_unsigned(_Tp __x)

I want this as well, but ClangFormat doesn't support it... The style which I'm using is close to FreeBSD style(9), please forgive. I don't mind if you send me a .clang-format file (the one I used is http://paste.openstack.org/show/726883/), but I won't manually adjust the code...

About why auto -- this file requires C++14 already, so I don't need to repeat my self (by writing make_unsigned_t<_Tp> again).

test/support/charconv_test_helpers.h
227

The callsites look like

run<test_signed>(all_signed);

That's why I named it "run". Not an issue for now I think.

Quuxplusone added a comment.EditedJul 30 2018, 8:53 PM

@lichray: I'm interested in merging this patch into my libc++ fork, but the latest update seems to be missing a ton of files. For example charconv_test_helpers.h is included by one of the tests, but does not exist. Also there's a lot of boilerplate missing: you ought to add the new header to module.modulemap, and to double_include.sh.cpp, and so on. Might you upload a new patch anytime soon?

EDIT: Weird, the diff is different now. Must have been some kind of race condition. Nvm.

lichray updated this revision to Diff 158404.Jul 31 2018, 3:28 PM

Dropping a lambda

mclow.lists accepted this revision.Jul 31 2018, 4:24 PM

LGTM. Thanks!

This revision is now accepted and ready to land.Jul 31 2018, 4:24 PM
This revision was automatically updated to reflect the committed changes.