This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libcxx/
-
include/
7/11
locale
-
src/
-
locale.cpp
-
test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/
-
std/
-
localization/
-
locale.categories/
-
category.numeric/
-
locale.num.get/
-
facet.num.get.members/
3/4
get_double.pass.cpp
4/6
get_float.pass.cpp
-
get_float_common.h
1
get_long_double.pass.cpp

Differential D99091

[locale][num_get] Improve Stage 2 of string to float conversion
AbandonedPublic

Authored by tmatheson on Mar 22 2021, 10:18 AM.

Download Raw Diff

Details

Reviewers

• Quuxplusone
zoecarver
miyuki
ldionne

Group Reviewers

Restricted Project

Summary

https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.2

"Stage 2" of num_get::do_get() depends on "a check ... to determine if c is
allowed as the next character of an input field of the conversion specifier
returned by Stage 1". Previously this was a very simple check whether the next
character was in a set of allowed characters. This could lead to Stage 2
accumulating character sequences such as "1.2f" and passing them to strtold
(Stage 3).
https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.3.3

Stage 3 can fail, however, if the entire character sequence from Stage 2 is not
used in the conversion. For example, the "f" in "1.2f" is not used.
https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.3.4

As a result, parsing a sequence like "1.2f" would return value 0.0 with failbit
set.

This change improves the checks made in Stage 2, determining what is passed to
Stage 3.

Hex digits are only considered valid if "0x" has been seen

INFINITY value is recognised

Characters in INFINITY and NAN are only valid in sequence. This is done by checking one character backwards, which has obvious limitations.

New tests are added. The old ones are preserved but refactored.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tmatheson requested review of this revision.Mar 22 2021, 10:18 AM

tmatheson created this revision.

Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2021, 10:18 AM

Herald added a reviewer: Restricted Project. · View Herald Transcript

Herald added a subscriber: libcxx-commits. · View Herald Transcript

Minor change to tests

tmatheson added reviewers: • Quuxplusone, zoecarver, miyuki.Mar 22 2021, 10:33 AM

miyuki added inline comments.Mar 22 2021, 10:33 AM

libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_float.pass.cpp
55	I think there is a reason behind having many similar-looking blocks: each assert() call is located on a separate line, so an assertion failure message will indicate which assertion failed. Factoring out all checks into a single function will make error messages much less informative.

Harbormaster completed remote builds in B95032: Diff 332350.Mar 22 2021, 11:47 AM

Harbormaster completed remote builds in B95035: Diff 332353.Mar 22 2021, 12:13 PM

Add tests for double and long double
Replace lambda with macro (there are C++03 builds, and filename/line number are not output when using lambda)
Remove constexpr (C++03 builds)
Undo refactoring of old tests

Tidy up tests

Harbormaster completed remote builds in B95223: Diff 332603.Mar 23 2021, 4:51 AM

Harbormaster completed remote builds in B95230: Diff 332612.Mar 23 2021, 5:05 AM

Replace lambda in header with macro

Also ping

Harbormaster completed remote builds in B96293: Diff 334122.Mar 30 2021, 8:34 AM

clang-format

tmatheson added a reviewer: ldionne.Mar 30 2021, 10:09 AM

Harbormaster completed remote builds in B96342: Diff 334185.Mar 30 2021, 10:20 AM

typo

Harbormaster completed remote builds in B96495: Diff 334396.Mar 31 2021, 10:34 AM

Update ABI symbols to account for added parameter in stage2_float_loop
Explicit cast in tests to avoid error when compiling with GCC

Ping

• Quuxplusone requested changes to this revision.Apr 7 2021, 8:02 AM

• Quuxplusone added inline comments.

libcxx/include/locale
1097	C++ doesn't support VLAs; you'd have to put something here that's a constant-expression. Do I understand correctly that the majority of this patch is just changing `32` and `33` to `36` and `37` respectively? Could you just do that in the simplest possible way? E.g. here - char_type __atoms[32]; + char_type __atoms[36]; That'll help focus attention on whatever details are actually important. Meanwhile, the important change seems to be that you're adding those extra 4 characters for `"tTyY"`, so that you can parse not only `"INF"` and `"NAN"` but also `"INFINITY"` (producing `INF`). Is this required by the Standard? Why didn't we have any tests for it before now? I tentatively suggest that you split out the `"INFINITY"` change+test into its own PR (with a summary that cites chapter and verse for why this is needed); and then let's see what remains in this PR after that.
libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_double.pass.cpp
57	`INFxyz`, `INFinity`, `INFinite`, `INFiNiTy`
66	`Shouldn't`

This revision now requires changes to proceed.Apr 7 2021, 8:02 AM

miyuki added inline comments.Apr 7 2021, 8:09 AM

libcxx/include/locale
536	I think it would be better to keep 4-space indentation for consistency with the rest of the file. See https://github.com/llvm/llvm-project/commit/7004d6664efde9d1148ed677649593f989cc6056
543	Would it be easier to handle these special cases outside of the loop?

miyuki added inline comments.Apr 7 2021, 8:22 AM

libcxx/include/locale
1097	I guess char_type __atoms[__num_get_base::__n_atoms_float]; will work. We already have something similar on line 1089: unsigned __g[__num_get_base::__num_get_buf_sz];

• Quuxplusone added inline comments.Apr 7 2021, 9:00 AM

libcxx/include/locale
1097	(For the record, I don't want to see `char_type __atoms[__num_get_base::__n_atoms_float]`; I want to see `char_type __atoms[36];`, in a separate PR, with explanation of why we need to parse `INFINITY` as a float, and appropriate tests.)

tmatheson marked an inline comment as done.Apr 7 2021, 9:29 AM

tmatheson added inline comments.

libcxx/include/locale
536	I agree, but the CI was failing if I didn't clang-format the patch.
543	Yes, and I tried but there are two problems with doing so. First it requires an even bigger refactor to get it to work. But more importantly, the standard is pretty over-specified here, to the point of essentially dictating the algorithm to use: https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.2 Therefore while moving these special cases outside of the loop would probably make a better algorithm, it would be a deviation from the standard.
1097	It seemed slightly more readable with one less magic number, but I can change it. This whole section of code is pretty obscure. String should be converted to float by the rules of strtold: https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.3.3 INFINITY is parsed by strtold as described here: https://en.cppreference.com/w/c/string/byte/strtof (C11 standard, "7.22.1.3 The strtod, strtof, and strtold functions") I don't know why there are no tests for it, but the tests in these files are pretty minimal and haven't been significantly changed since "libcxx initial import". I will remove the INFINITY stuff from this PR.

Remove handling of INFINITY and associated tests, add a few tests for variations of INF

tmatheson marked 4 inline comments as done.Apr 7 2021, 10:16 AM

tmatheson added inline comments.

libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_double.pass.cpp
57	Added INFxyz and some crazy casing for INF. It is worth noting that (by my reading of the standard, at least) `INFinite` is required to fail because: Stage 1 will select `%g` as the format specifier Stage 2 will keep consuming characters until it reaches the `'e'`, which is the first character that is not valid at that point in the sequence Stage 3 will then try to process the strong `"INFinit"` and should only process the first 3 characters (`INF`). Therefore the whole string will not be read, and `num_get` should return 0.0 with `failbit` I suggest we save that discussion for the INFINITE PR though.

• Quuxplusone added inline comments.Apr 7 2021, 11:12 AM

libcxx/include/locale
536	The clang-format CI step is non-fatal; it's just there to tell you what parts fail clang-format. It is not intended as a black-and-white gatekeeper, and requests-for-consistency from real humans should always take precedence over the tool's suggestions — Asimov's Second Law applies. :)
libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_double.pass.cpp
57	The point of testing `"INFINITE"` is for two reasons: There is a right answer. Test that we produce the right answer. It's a "garden-path" input: we want to test that the parser doesn't get confused by seeing "INFINI...", and will correctly backtrack to consume only the "INF" part, instead of failing when it fails to find a "Y" character at the end of the string. So, please test its behavior (in the appropriate PR which it sounds like you're splitting out; nice!). I expect that behavior to resemble the behavior for `"INFxyz"`; but if it doesn't, then we should investigate why.

Update formatting to match the rest of the file

tmatheson marked 2 inline comments as done.Apr 7 2021, 12:42 PM

tmatheson added inline comments.

libcxx/include/locale
536	Sorry my mistake, I thought it had caused one of the failures I had earlier
libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_float.pass.cpp
55	This was addressed by using a macro for new tests and not refactoring the old tests.

Harbormaster completed remote builds in B97477: Diff 335762.Apr 7 2021, 8:04 PM

Harbormaster completed remote builds in B97543: Diff 335857.Apr 8 2021, 3:12 AM

Harbormaster completed remote builds in B97578: Diff 335906.Apr 8 2021, 3:47 AM

Ping

Update abilist for MacOS C++20

Harbormaster completed remote builds in B98656: Diff 337394.Apr 14 2021, 6:00 AM

Ping. @Quuxplusone were there any outstanding issues you wanted me to address?

Also, can anyone suggest what I should do about the ABI issues? The function I've changed (__num_get<_CharT>::__stage2_float_loop) doesn't seem like it should be part of the public ABI. After updating the expected names there are still CI failures on MacOS. Should the changes be wrapped with the macros described here? https://libcxx.llvm.org/docs/DesignDocs/ABIVersioning.html

Friendly ping

Ping

In D99091#2704411, @tmatheson wrote:

Also, can anyone suggest what I should do about the ABI issues? The function I've changed (__num_get<_CharT>::__stage2_float_loop) doesn't seem like it should be part of the public ABI. After updating the expected names there are still CI failures on MacOS. Should the changes be wrapped with the macros described here? https://libcxx.llvm.org/docs/DesignDocs/ABIVersioning.html

Yes the additional argument changes the ABI and should be an opt-in for the user. This can be done by adding a new define in include/__config in the block #if defined(_LIBCPP_ABI_UNSTABLE) || _LIBCPP_ABI_VERSION >= 2.

Place breaking ABI change behind a macro

Yes the additional argument changes the ABI and should be an opt-in for the user. This can be done by adding a new define in include/__config in the block #if defined(_LIBCPP_ABI_UNSTABLE) || _LIBCPP_ABI_VERSION >= 2.

Thank you @Mordante, the change is now hidden behind the ABI flag.

Harbormaster completed remote builds in B103496: Diff 344069.May 10 2021, 10:43 AM

Ping @Quuxplusone

I left some more minor comments... but FYI, this patch's actual functional change is above my pay grade. You'll have to interest @ldionne or someone like that in reviewing its actual functional change.
If the intent is speed, it would help for you to add to this PR a benchmark following the pattern in libcxx/benchmarks/.

libcxx/include/locale
656	`#define _Toupper(x) ((x) & 0x5F)` — "`__UPPERCASE`" reads to me like it's testing uppercaseness. Separately: check throughout for `/preceed/preced/`.
libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_float.pass.cpp
50	I'd still like to see the test strings `"INFINITE"` and `"INFINITY"` somewhere in these tests. Also `INAN`, now that I think of it (because it looks like it's gonna be "INF" and then switches to "NAN" in the middle) — and any other "white-box" test cases you can think of. Also every-possible-prefix-of-a-correct string: `0x`, `123e`, `123e+`, `123e-`. (Maybe these are already covered somewhere?)
libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_long_double.pass.cpp
43	Is this ever defined during testing, though?

Thanks @Quuxplusone. If @ldionne can't be tempted into taking a look is there anyone else appropriate?

This patch is about correctness (still a long way to go, see below) rather than speed/performance so I won't add a benchmark.

libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_float.pass.cpp
50	I'll try to explain what the issue is with adding "intinite" and "infinity" tests -- besides the fact that "infinity" worked before I removed it on request :) On one hand, we have the ideal behaviour, which is that `do_get` handles anything that `strtod` can handle. Then we have the over-specified wording in the standard, which actually defines `do_get` in three stages, and requires us to (paraphrasing) "loop until we reach an invalid character, then stop and pass the substring to `strtod` for parsing." (The actual wording: "a check is made to determine if c is allowed as the next character of an input field of the conversion specifier returned by Stage 1. If so, it is accumulated.") This wording in the standard is probably why we have this slightly bizarre coding style in the first place, with the body of the loop factored out (and the signature added to the ABI). With a garden path input like "infinite" you need to be able to backtrack in stage 2, and reject all the characters that you previously considered valid (e.g. "init") until you get to a valid substring. The parser described in the standard is simply not up to the job. You might think we can err on the side of caution, and accumulate more characters than we need, and let `strtod` use what it wants (e.g. halt stage 2 after accumulating "infinit" and let `strtod` just parse "inf"). But no, it must return zero and give you an error: https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.3.4 As such I can see no way that we can actually meet the following criteria simultaneously: Be compliant with the standard, e.g. stick to the algorithm it describes Handle all valid floating point inputs Always return the same result as `strtod` would The initial problem I wanted to solve is that "1.2f" is badly parsed. This is not a valid floating point number. "1.2" is valid. The "f" should be ignored, i.e. not passed to stage 2. It was not ignored because "f" is a valid hex digit. But the existing code had no way of keeping track of whether hex digits are valid at the current point in the string. I have added this here because it is relatively simple to do without a complete refactor, and it still looks like the standard if you squint. Adding the ability to backtrack to handle garden path inputs is a significant deviation from the standard imo. This change should increase the number of correctly handled cases and hopefully not break any that were handled correctly before, but it is not perfect or complete.

• Quuxplusone added inline comments.May 17 2021, 11:55 AM

libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_float.pass.cpp
50	I'd still like to see those tests, because otherwise we have codepaths that we're not exercising — which means they could segfault, or worse, for all we know. I see two not-mutually-exclusive alternatives: Figure out the "least common correct denominator" of all implementations, and add a minimal coverage test in test/std/ for it. If the least common denominator is just "we should be able to call it without crashing," then test `try { (void)thething("INFINITY"); } catch (...) {}`. But we should still hit the codepath. And/or, figure out the "current behavior of libc++," and add a regression test in test/libcxx/, so that we test our behavior, and then we'll know if it ever regresses by accident. Because we do care about regressions in this area, right? (That's why you're introducing the ABI flag in the first place.) The responsible thing to do would probably be to add both of these approaches, honestly, now that I've said them out loud.

Ping. @ldionne? It doesn't seem like there is a lot of interest in this fix, since it's been up for over 2 months now in more or less the same form. I'm happy to keep making the suggested changes if it has a chance of landing, otherwise please let me know and I will just close it.

libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_float.pass.cpp

Sure, I can add something like this to test/std:

// FIXME these are all currently known to give the wrong answer
TEST("INFINITY", 8, 0.0, ios.failbit);

// FIXME it is unclear what the correct answer is in these cases
TEST("INAN", 1, 0.0, ios.failbit);
TEST("INFINITY", 3, INFINITY, ios.goodbit);

tmatheson abandoned this revision.Dec 16 2021, 6:17 AM

Somewhat embarrassing, but this simply went under my radar. I am aware of a couple of correctness bugs against our localization library/iostream and I am definitely interested in patches that would fix those. If you are still willing to, we can reopen and review this.

This overall LGTM, but I'd like to enable this change slightly differently. It should allow us to use the new function for all software built for a sufficiently recent deployment target. What we could do is keep the old implementation of __stage2_float_loop without the bool& hex parameter inside the dylib only and use a bit of cleverness to keep things working on older versions, like so:

///////////////////////////////////////////////////////////////////////////////////////////////////
// in <__config>, add (grep for _LIBCPP_ABI_ENABLE_ADDITIONAL_IOSTREAM_EXPLICIT_INSTANTIATIONS_1):
///////////////////////////////////////////////////////////////////////////////////////////////////
#if defined(_LIBCPP_BUILDING_LIBRARY) || defined(_LIBCPP_ABI_UNSTABLE) || _LIBCPP_ABI_VERSION >= 2
// Explain what's going on
# define _LIBCPP_ABI_LOCAL_NUM_GET_NEW_STAGE2_FLOAT_LOOP
#endif


///////////////////////////////////////////////////////////////////////////////////////////////////
// in <locale>:
///////////////////////////////////////////////////////////////////////////////////////////////////
template <class _CharT>
struct __num_get : protected __num_get_base {
#if defined(_LIBCPP_ABI_LOCAL_NUM_GET_NEW_STAGE2_FLOAT_LOOP)
    // your new implementation
    static int __stage2_float_loop(_CharT, bool&, char&, char*, char*&, _CharT, _CharT, const string&, unsigned*, unsigned*&, unsigned&, _CharT*, bool& __hex);
#else
    // new signature but forwarding to the old function. now you can change all the callers of this
    // function to use the new signature unconditionally, effectively making this hack more local.
    _LIBCPP_HIDE_FROM_ABI static inline 
    int __stage2_float_loop(_CharT, bool&, char&, char*, char*&, _CharT, _CharT, const string&, unsigned*, unsigned*&, unsigned&, _CharT*, bool& __hex) {
        return __stage2_float_loop(same args but drop __hex);
    }
#endif

    // If we are building the dylib, we keep the old function around for backwards compatibility.
    // If we are building for a target that doesn't support the new implementation, use the old function.
#if defined(_LIBCPP_BUILDING_LIBRARY) || !defined(_LIBCPP_ABI_LOCAL_NUM_GET_NEW_STAGE2_FLOAT_LOOP)
    static int __stage2_float_loop(_CharT, bool&, char&, char*, char*&, _CharT, _CharT, const string&, unsigned*, unsigned*&, unsigned&, _CharT*);
#endif
};

Benefits of this approach:

All callers can pretend they are using the new, fixed version.
The dylib retains the old function implementation, so we don't break the ABI.
By default, all code will keep using the old implementation, but the new implementation will also start being exported by the shared library.
Once the new implementation has shipped, vendors can go back and enable _LIBCPP_ABI_LOCAL_NUM_GET_NEW_STAGE2_FLOAT_LOOP based on whether they know the user is compiling for a target where the new implementation existed.

Concretely, this means that for example I can go back in a bit of time and enable the fix for anyone that's deploying to a recent macOS/iOS/whateverOS. The same can be done for other platforms (although I don't know that other vendors use these sorts of tricks that are available to them).

P.S.: The localization code is crazy and you're a real warrior for jumping into it -- I'm really sorry you didn't catch my attention the first time around.

[Github PR transition cleanup]

Revived as https://github.com/llvm/llvm-project/pull/65168

Herald added a project: Restricted Project. · View Herald TranscriptSep 1 2023, 6:14 AM

Revision Contents

Path

Size

libcxx/

include/

locale

121 lines

src/

locale.cpp

3 lines

test/

std/

localization/

locale.categories/

category.numeric/

locale.num.get/

facet.num.get.members/

get_double.pass.cpp

33 lines

get_float.pass.cpp

33 lines

get_float_common.h

22 lines

get_long_double.pass.cpp

34 lines

Diff 334122

libcxx/include/locale

Show First 20 Lines • Show All 361 Lines • ▼ Show 20 Lines	__scan_keyword(_InputIterator& __b, _InputIterator __e,
return __kb;		return __kb;
}		}

struct _LIBCPP_TYPE_VIS __num_get_base		struct _LIBCPP_TYPE_VIS __num_get_base
{		{
static const int __num_get_buf_sz = 40;		static const int __num_get_buf_sz = 40;

static int __get_base(ios_base&);		static int __get_base(ios_base&);
static const char __src[33];		static const int __n_atoms_float = 36; // float has the largest character set
		static const char __src[__n_atoms_float + 1]; // includes null terminator
};		};

_LIBCPP_FUNC_VIS		_LIBCPP_FUNC_VIS
void __check_grouping(const string& __grouping, unsigned* __g, unsigned* __g_end,		void __check_grouping(const string& __grouping, unsigned* __g, unsigned* __g_end,
ios_base::iostate& __err);		ios_base::iostate& __err);

template <class _CharT>		template <class _CharT>
struct __num_get		struct __num_get
: protected __num_get_base		: protected __num_get_base
{		{
static string __stage2_float_prep(ios_base& __iob, _CharT* __atoms, _CharT& __decimal_point,		static string __stage2_float_prep(ios_base& __iob, _CharT* __atoms, _CharT& __decimal_point,
_CharT& __thousands_sep);		_CharT& __thousands_sep);

static int __stage2_float_loop(_CharT __ct, bool& __in_units, char& __exp,		static int __stage2_float_loop(_CharT __ct, bool& __in_units, char& __exp,
char* __a, char*& __a_end,		char* __a, char*& __a_end,
_CharT __decimal_point, _CharT __thousands_sep,		_CharT __decimal_point, _CharT __thousands_sep,
const string& __grouping, unsigned* __g,		const string& __grouping, unsigned* __g,
unsigned& __g_end, unsigned& __dc, _CharT __atoms);		unsigned& __g_end, unsigned& __dc, _CharT __atoms,
		bool& __hex);
#ifndef _LIBCPP_ABI_OPTIMIZED_LOCALE_NUM_GET		#ifndef _LIBCPP_ABI_OPTIMIZED_LOCALE_NUM_GET
static string __stage2_int_prep(ios_base& __iob, _CharT* __atoms, _CharT& __thousands_sep);		static string __stage2_int_prep(ios_base& __iob, _CharT* __atoms, _CharT& __thousands_sep);
static int __stage2_int_loop(_CharT __ct, int __base, char* __a, char*& __a_end,		static int __stage2_int_loop(_CharT __ct, int __base, char* __a, char*& __a_end,
unsigned& __dc, _CharT __thousands_sep, const string& __grouping,		unsigned& __dc, _CharT __thousands_sep, const string& __grouping,
unsigned* __g, unsigned& __g_end, _CharT __atoms);		unsigned* __g, unsigned& __g_end, _CharT __atoms);

#else		#else
static string __stage2_int_prep(ios_base& __iob, _CharT& __thousands_sep)		static string __stage2_int_prep(ios_base& __iob, _CharT& __thousands_sep)
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
#endif		#endif

template <class _CharT>		template <class _CharT>
string		string
__num_get<_CharT>::__stage2_float_prep(ios_base& __iob, _CharT* __atoms, _CharT& __decimal_point,		__num_get<_CharT>::__stage2_float_prep(ios_base& __iob, _CharT* __atoms, _CharT& __decimal_point,
_CharT& __thousands_sep)		_CharT& __thousands_sep)
{		{
locale __loc = __iob.getloc();		locale __loc = __iob.getloc();
use_facet<ctype<_CharT> >(__loc).widen(__src, __src + 32, __atoms);		use_facet<ctype<_CharT> >(__loc).widen(__src, __src + __n_atoms_float, __atoms);
const numpunct<_CharT>& __np = use_facet<numpunct<_CharT> >(__loc);		const numpunct<_CharT>& __np = use_facet<numpunct<_CharT> >(__loc);
__decimal_point = __np.decimal_point();		__decimal_point = __np.decimal_point();
__thousands_sep = __np.thousands_sep();		__thousands_sep = __np.thousands_sep();
return __np.grouping();		return __np.grouping();
}		}

template <class _CharT>		template <class _CharT>
int		int
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	#endif
++__dc;		++__dc;
return 0;		return 0;
}		}

template <class _CharT>		template <class _CharT>
int		int
__num_get<_CharT>::__stage2_float_loop(_CharT __ct, bool& __in_units, char& __exp, char* __a, char*& __a_end,		__num_get<_CharT>::__stage2_float_loop(_CharT __ct, bool& __in_units, char& __exp, char* __a, char*& __a_end,
_CharT __decimal_point, _CharT __thousands_sep, const string& __grouping,		_CharT __decimal_point, _CharT __thousands_sep, const string& __grouping,
unsigned* __g, unsigned& __g_end, unsigned& __dc, _CharT __atoms)		unsigned* __g, unsigned& __g_end, unsigned& __dc, _CharT __atoms, bool& __hex)
{		{
		#define __UPPERCASE(X) ((x) & 0x5F)

if (__ct == __decimal_point)		if (__ct == __decimal_point)
{		{
if (!__in_units)		if (!__in_units)
return -1;		return -1;
__in_units = false;		__in_units = false;
*__a_end++ = '.';		*__a_end++ = '.';
if (__grouping.size() != 0 && __g_end-__g < __num_get_buf_sz)		if (__grouping.size() != 0 && __g_end-__g < __num_get_buf_sz)
*__g_end++ = __dc;		*__g_end++ = __dc;
return 0;		return 0;
}		}
if (__ct == __thousands_sep && __grouping.size() != 0)		if (__ct == __thousands_sep && __grouping.size() != 0)
{		{
if (!__in_units)		if (!__in_units)
return -1;		return -1;
if (__g_end-__g < __num_get_buf_sz)		if (__g_end-__g < __num_get_buf_sz)
{		{
		miyukiUnsubmitted Done Reply Inline Actions I think it would be better to keep 4-space indentation for consistency with the rest of the file. See https://github.com/llvm/llvm-project/commit/7004d6664efde9d1148ed677649593f989cc6056 miyuki: I think it would be better to keep 4-space indentation for consistency with the rest of the…
		tmathesonAuthorUnsubmitted Done Reply Inline Actions I agree, but the CI was failing if I didn't clang-format the patch. tmatheson: I agree, but the CI was failing if I didn't clang-format the patch.
		QuuxplusoneUnsubmitted Not Done Reply Inline Actions The clang-format CI step is non-fatal; it's just there to tell you what parts fail clang-format. It is not intended as a black-and-white gatekeeper, and requests-for-consistency from real humans should always take precedence over the tool's suggestions — Asimov's Second Law applies. :) Quuxplusone: The clang-format CI step is non-fatal; it's just there to tell you what parts fail clang-format.
		tmathesonAuthorUnsubmitted Done Reply Inline Actions Sorry my mistake, I thought it had caused one of the failures I had earlier tmatheson: Sorry my mistake, I thought it had caused one of the failures I had earlier
*__g_end++ = __dc;		*__g_end++ = __dc;
__dc = 0;		__dc = 0;
}		}
return 0;		return 0;
}		}
ptrdiff_t __f = find(__atoms, __atoms + 32, __ct) - __atoms;
if (__f >= 32)		ptrdiff_t __f = find(__atoms, __atoms + __n_atoms_float, __ct) - __atoms;
		miyukiUnsubmitted Not Done Reply Inline Actions Would it be easier to handle these special cases outside of the loop? miyuki: Would it be easier to handle these special cases outside of the loop?
		tmathesonAuthorUnsubmitted Done Reply Inline Actions Yes, and I tried but there are two problems with doing so. First it requires an even bigger refactor to get it to work. But more importantly, the standard is pretty over-specified here, to the point of essentially dictating the algorithm to use: https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.2 Therefore while moving these special cases outside of the loop would probably make a better algorithm, it would be a deviation from the standard. tmatheson: Yes, and I tried but there are two problems with doing so. First it requires an even bigger…
		const bool __is_digit = __hex ? __f < 22 : __f < 10;
		const bool __first = __a_end == __a;
		if(__f >= __n_atoms_float)
return -1;		return -1;
char __x = __src[__f];		char __x = __src[__f];
		char __X = __UPPERCASE(__x);

		// Return early -1 for any character that is not valid at this point
if (__x == '-' \|\| __x == '+')		if (__x == '-' \|\| __x == '+')
{		{
if (__a_end == __a \|\| (__a_end[-1] & 0x5F) == (__exp & 0x7F))		// Previous character must be __exp, which was marked as seen setting bit 0x80
{		if (!__first && __UPPERCASE(__a_end[-1]) != (__exp & 0x7F))
*__a_end++ = __x;
return 0;
}
return -1;		return -1;
}		}
if (__x == 'x' \|\| __x == 'X')		else if (__x == 'x' \|\| __x == 'X')
		{
		// Can't have 'x' or 'X' as the first character
		if(__first)
		return -1;
		// Must be preceeded by a '0'
		if(__a_end[-1] != __atoms[0])
		return -1;
		// Can't have multiple occurrences of 'x'
		if(__hex)
		return -1;
		__hex = true;
__exp = 'P';		__exp = 'P';
else if ((__x & 0x5F) == __exp)		}
		else if (__X == __exp)
{		{
		// Can't have e/E/p/P as first character
		if (__first)
		return -1;
		// Mark exponent as seen
__exp \|= (char) 0x80;		__exp \|= (char) 0x80;
if (__in_units)		if (__in_units)
{		{
__in_units = false;		__in_units = false;
if (__grouping.size() != 0 && __g_end-__g < __num_get_buf_sz)		if (__grouping.size() != 0 && __g_end-__g < __num_get_buf_sz)
*__g_end++ = __dc;		*__g_end++ = __dc;
}		}
}		}
		else if (!__is_digit) {
		// Not '.' or __thousands_sep or '+' or '-' or 'x' or __exp or digit.
		// Special handling for the characters in INF/INFINITY/NAN.
		// These must appear at the start of the sequence, possibly preceeded by + or -.
		// Look back one character to check that these are part of a valid sequence.
		// FIXME currently can't handle NANANANAN.

		if (__first) {
		// + and - as first character are handled in a separate branch.
		if (__X != 'I' && __X != 'N')
		return -1;
		} else {
		char __prev = __src[find(__atoms, __atoms + __n_atoms_float, __a_end[-1]) - __atoms];
		char __PREV = __UPPERCASE(__prev);

		// Rule out special characters out of sequence INFINITY or NAN.
		if (__X == 'I')
		{
		if (__prev != '+' && __prev != '-' && __PREV != 'F' && __PREV != 'N')
		return -1;
		}
		else if (__X == 'N')
		{
		if (__prev != '+' && __prev != '-' && __PREV != 'I' && __PREV != 'A')
		return -1;
		}
		else if (__X == 'F')
		{
		if (__PREV != 'N')
		return -1;
		}
		else if (__X == 'T')
		{
		if (__PREV != 'I')
		return -1;
		}
		else if (__X == 'Y')
		{
		if (__PREV != 'T')
		return -1;
		}
		else if (__X == 'A')
		{
		if (__PREV != 'N')
		return -1;
		}
		else if(!__is_digit)
		{
		return -1;
		}
		}
		}

		// "...c is allowed as the next character of an input field of the conversion specifier returned by Stage 1."
*__a_end++ = __x;		*__a_end++ = __x;
if (__f >= 22)
return 0;		if (__is_digit)
++__dc;		++__dc;

return 0;		return 0;
		#undef __UPPERCASE
}		}

_LIBCPP_EXTERN_TEMPLATE_EVEN_IN_DEBUG_MODE(struct _LIBCPP_EXTERN_TEMPLATE_TYPE_VIS __num_get<char>)		_LIBCPP_EXTERN_TEMPLATE_EVEN_IN_DEBUG_MODE(struct _LIBCPP_EXTERN_TEMPLATE_TYPE_VIS __num_get<char>)
_LIBCPP_EXTERN_TEMPLATE_EVEN_IN_DEBUG_MODE(struct _LIBCPP_EXTERN_TEMPLATE_TYPE_VIS __num_get<wchar_t>)		_LIBCPP_EXTERN_TEMPLATE_EVEN_IN_DEBUG_MODE(struct _LIBCPP_EXTERN_TEMPLATE_TYPE_VIS __num_get<wchar_t>)

template <class _CharT, class _InputIterator = istreambuf_iterator<_CharT> >		template <class _CharT, class _InputIterator = istreambuf_iterator<_CharT> >
class _LIBCPP_TEMPLATE_VIS num_get		class _LIBCPP_TEMPLATE_VIS num_get
: public locale::facet,		: public locale::facet,
private __num_get<_CharT>		private __num_get<_CharT>
{		{
		QuuxplusoneUnsubmitted Not Done Reply Inline Actions `#define _Toupper(x) ((x) & 0x5F)` — "`__UPPERCASE`" reads to me like it's testing uppercaseness. Separately: check throughout for `/preceed/preced/`. Quuxplusone: `#define _Toupper(x) ((x) & 0x5F)` — "`__UPPERCASE`" reads to me like it's //testing//…
public:		public:
typedef _CharT char_type;		typedef _CharT char_type;
typedef _InputIterator iter_type;		typedef _InputIterator iter_type;

_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
explicit num_get(size_t __refs = 0)		explicit num_get(size_t __refs = 0)
: locale::facet(__refs) {}		: locale::facet(__refs) {}

▲ Show 20 Lines • Show All 424 Lines • ▼ Show 20 Lines
_InputIterator		_InputIterator
num_get<_CharT, _InputIterator>::__do_get_floating_point(iter_type __b, iter_type __e,		num_get<_CharT, _InputIterator>::__do_get_floating_point(iter_type __b, iter_type __e,
ios_base& __iob,		ios_base& __iob,
ios_base::iostate& __err,		ios_base::iostate& __err,
_Fp& __v) const		_Fp& __v) const
{		{
// Stage 1, nothing to do		// Stage 1, nothing to do
// Stage 2		// Stage 2
char_type __atoms[32];		char_type __atoms[this->__n_atoms_float];
		QuuxplusoneUnsubmitted Done Reply Inline Actions C++ doesn't support VLAs; you'd have to put something here that's a constant-expression. Do I understand correctly that the majority of this patch is just changing `32` and `33` to `36` and `37` respectively? Could you just do that in the simplest possible way? E.g. here - char_type __atoms[32]; + char_type __atoms[36]; That'll help focus attention on whatever details are actually important. Meanwhile, the important change seems to be that you're adding those extra 4 characters for `"tTyY"`, so that you can parse not only `"INF"` and `"NAN"` but also `"INFINITY"` (producing `INF`). Is this required by the Standard? Why didn't we have any tests for it before now? I tentatively suggest that you split out the `"INFINITY"` change+test into its own PR (with a summary that cites chapter and verse for why this is needed); and then let's see what remains in this PR after that. Quuxplusone: C++ doesn't support VLAs; you'd have to put something here that's a constant-expression. Do I…
		miyukiUnsubmitted Not Done Reply Inline Actions I guess char_type __atoms[__num_get_base::__n_atoms_float]; will work. We already have something similar on line 1089: unsigned __g[__num_get_base::__num_get_buf_sz]; miyuki: I guess ``` char_type __atoms[__num_get_base::__n_atoms_float]; ``` will work. We already have…
		QuuxplusoneUnsubmitted Done Reply Inline Actions (For the record, I don't want to see `char_type __atoms[__num_get_base::__n_atoms_float]`; I want to see `char_type __atoms[36];`, in a separate PR, with explanation of why we need to parse `INFINITY` as a float, and appropriate tests.) Quuxplusone: (For the record, //I// don't want to see `char_type __atoms[__num_get_base::__n_atoms_float]`…
		tmathesonAuthorUnsubmitted Done Reply Inline Actions It seemed slightly more readable with one less magic number, but I can change it. This whole section of code is pretty obscure. String should be converted to float by the rules of strtold: https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.3.3 INFINITY is parsed by strtold as described here: https://en.cppreference.com/w/c/string/byte/strtof (C11 standard, "7.22.1.3 The strtod, strtof, and strtold functions") I don't know why there are no tests for it, but the tests in these files are pretty minimal and haven't been significantly changed since "libcxx initial import". I will remove the INFINITY stuff from this PR. tmatheson: It seemed slightly more readable with one less magic number, but I can change it. This whole…
char_type __decimal_point;		char_type __decimal_point;
char_type __thousands_sep;		char_type __thousands_sep;
string __grouping = this->__stage2_float_prep(__iob, __atoms,		string __grouping = this->__stage2_float_prep(__iob, __atoms,
__decimal_point,		__decimal_point,
__thousands_sep);		__thousands_sep);
string __buf;		string __buf;
__buf.resize(__buf.capacity());		__buf.resize(__buf.capacity());
char* __a = &__buf[0];		char* __a = &__buf[0];
char* __a_end = __a;		char* __a_end = __a;
unsigned __g[__num_get_base::__num_get_buf_sz];		unsigned __g[__num_get_base::__num_get_buf_sz];
unsigned* __g_end = __g;		unsigned* __g_end = __g;
unsigned __dc = 0;		unsigned __dc = 0;
bool __in_units = true;		bool __in_units = true;
char __exp = 'E';		char __exp = 'E';
		bool __hex = false; //< set to true when we see 0x

for (; __b != __e; ++__b)		for (; __b != __e; ++__b)
{		{
if (__a_end == __a + __buf.size())		if (__a_end == __a + __buf.size())
{		{
size_t __tmp = __buf.size();		size_t __tmp = __buf.size();
__buf.resize(2*__buf.size());		__buf.resize(2*__buf.size());
__buf.resize(__buf.capacity());		__buf.resize(__buf.capacity());
__a = &__buf[0];		__a = &__buf[0];
__a_end = __a + __tmp;		__a_end = __a + __tmp;
}		}
if (this->__stage2_float_loop(*__b, __in_units, __exp, __a, __a_end,		if (this->__stage2_float_loop(*__b, __in_units, __exp, __a, __a_end,
__decimal_point, __thousands_sep,		__decimal_point, __thousands_sep,
__grouping, __g, __g_end,		__grouping, __g, __g_end,
__dc, __atoms))		__dc, __atoms, __hex))
break;		break;
}		}
if (__grouping.size() != 0 && __in_units && __g_end-__g < __num_get_base::__num_get_buf_sz)		if (__grouping.size() != 0 && __in_units && __g_end-__g < __num_get_base::__num_get_buf_sz)
*__g_end++ = __dc;		*__g_end++ = __dc;
// Stage 3		// Stage 3
__v = __num_get_float<_Fp>(__a, __a_end, __err);		__v = __num_get_float<_Fp>(__a, __a_end, __err);
// Digit grouping checked		// Digit grouping checked
__check_grouping(__grouping, __g, __g_end, __err);		__check_grouping(__grouping, __g, __g_end, __err);
▲ Show 20 Lines • Show All 3,325 Lines • Show Last 20 Lines

libcxx/src/locale.cpp

Show First 20 Lines • Show All 4,553 Lines • ▼ Show 20 Lines	if (__basefield == ios_base::oct)
return 8;		return 8;
else if (__basefield == ios_base::hex)		else if (__basefield == ios_base::hex)
return 16;		return 16;
else if (__basefield == 0)		else if (__basefield == 0)
return 0;		return 0;
return 10;		return 10;
}		}

const char __num_get_base::__src[33] = "0123456789abcdefABCDEFxX+-pPiInN";		const char __num_get_base::__src[__num_get_base::__n_atoms_float + 1] =
		"0123456789abcdefABCDEFxX+-pPiInNtTyY";

void		void
__check_grouping(const string& __grouping, unsigned* __g, unsigned* __g_end,		__check_grouping(const string& __grouping, unsigned* __g, unsigned* __g_end,
ios_base::iostate& __err)		ios_base::iostate& __err)
{		{
// if the grouping pattern is empty _or_ there are no grouping bits, then do nothing		// if the grouping pattern is empty _or_ there are no grouping bits, then do nothing
// we always have at least a single entry in [__g, __g_end); the end of the input sequence		// we always have at least a single entry in [__g, __g_end); the end of the input sequence
if (__grouping.size() != 0 && __g_end - __g > 1)		if (__grouping.size() != 0 && __g_end - __g > 1)
▲ Show 20 Lines • Show All 1,774 Lines • Show Last 20 Lines

libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_double.pass.cpp

Show All 15 Lines
#include <locale>		#include <locale>
#include <ios>		#include <ios>
#include <cassert>		#include <cassert>
#include <streambuf>		#include <streambuf>
#include <cmath>		#include <cmath>
#include "test_macros.h"		#include "test_macros.h"
#include "test_iterators.h"		#include "test_iterators.h"
#include "hexfloat.h"		#include "hexfloat.h"
		#include "get_float_common.h"

typedef std::num_get<char, input_iterator<const char*> > F;		typedef std::num_get<char, input_iterator<const char*> > F;

class my_facet		class my_facet
: public F		: public F
{		{
public:		public:
explicit my_facet(std::size_t refs = 0)		explicit my_facet(std::size_t refs = 0)
Show All 12 Lines	protected:
virtual std::string do_grouping() const {return std::string("\1\2\3");}		virtual std::string do_grouping() const {return std::string("\1\2\3");}
};		};

int main(int, char**)		int main(int, char**)
{		{
const my_facet f(1);		const my_facet f(1);
std::ios ios(0);		std::ios ios(0);
double v = -1;		double v = -1;

		// Valid floating point formats where whole string is consumed
		TEST("0x123.4f", 8, hexfloat<double>(0x123, 0x4f, 0), ios.eofbit);
		TEST("inf", 3, INFINITY, ios.goodbit \| ios.eofbit);
		TEST("INFINITY", 8, INFINITY, ios.eofbit \| ios.goodbit);
		QuuxplusoneUnsubmitted Done Reply Inline Actions `INFxyz`, `INFinity`, `INFinite`, `INFiNiTy` Quuxplusone: `INFxyz`, `INFinity`, `INFinite`, `INFiNiTy`
		tmathesonAuthorUnsubmitted Done Reply Inline Actions Added INFxyz and some crazy casing for INF. It is worth noting that (by my reading of the standard, at least) `INFinite` is required to fail because: Stage 1 will select `%g` as the format specifier Stage 2 will keep consuming characters until it reaches the `'e'`, which is the first character that is not valid at that point in the sequence Stage 3 will then try to process the strong `"INFinit"` and should only process the first 3 characters (`INF`). Therefore the whole string will not be read, and `num_get` should return 0.0 with `failbit` I suggest we save that discussion for the INFINITE PR though. tmatheson: Added INFxyz and some crazy casing for INF. It is worth noting that (by my reading of the…
		QuuxplusoneUnsubmitted Not Done Reply Inline Actions The point of testing `"INFINITE"` is for two reasons: There is a right answer. Test that we produce the right answer. It's a "garden-path" input: we want to test that the parser doesn't get confused by seeing "INFINI...", and will correctly backtrack to consume only the "INF" part, instead of failing when it fails to find a "Y" character at the end of the string. So, please test its behavior (in the appropriate PR which it sounds like you're splitting out; nice!). I expect that behavior to resemble the behavior for `"INFxyz"`; but if it doesn't, then we should investigate why. Quuxplusone: The point of testing `"INFINITE"` is for two reasons: - There is a right answer. Test that we…

		// Valid floating point formats with unparsed trailing characters
		TEST("123.4f", 5, 123.4, ios.goodbit);
		TEST("123xyz", 3, 123.0, ios.goodbit);
		TEST("0x123.4+", 7, hexfloat<double>(0x123, 0x4, 0), ios.goodbit);
		// TEST("NININININ", 3, NAN, ios.goodbit);
		// TEST("NANANANAN", 3, NAN, ios.goodbit);

		// Should't recognise e, p or x more than once
		QuuxplusoneUnsubmitted Done Reply Inline Actions `Shouldn't` Quuxplusone: `Shouldn't`
		TEST("123.4e-5e-4", 8, 123.4e-5, ios.goodbit);
		TEST("0x123.4p-5p-4", 10, hexfloat<double>(0x123, 0x4, -5), ios.goodbit);
		TEST("0x123x5", 5, hexfloat<double>(0x123, 0x0, 0), ios.goodbit);

		// Invalid (non-float) inputs
		TEST("a", 0, 0.0, ios.failbit);
		TEST("e", 0, 0.0, ios.failbit);
		TEST("f", 0, 0.0, ios.failbit);
		TEST("p", 0, 0.0, ios.failbit);
		TEST("M", 0, 0.0, ios.failbit);
		TEST("{}", 0, 0.0, ios.failbit);
		TEST("x123", 0, 0.0, ios.failbit);

		// Incomplete inputs, i.e. eof before finished parsing
		TEST("-", 1, 0.0, ios.eofbit \| ios.failbit);
		TEST("+", 1, 0.0, ios.eofbit \| ios.failbit);
		TEST("0x123.4p", 8, 0.0, ios.eofbit \| ios.failbit);

{		{
const char str[] = "123";		const char str[] = "123";
assert((ios.flags() & ios.basefield) == ios.dec);		assert((ios.flags() & ios.basefield) == ios.dec);
assert(ios.getloc().name() == "C");		assert(ios.getloc().name() == "C");
std::ios_base::iostate err = ios.goodbit;		std::ios_base::iostate err = ios.goodbit;
input_iterator<const char*> iter =		input_iterator<const char*> iter =
f.get(input_iterator<const char*>(str),		f.get(input_iterator<const char*>(str),
input_iterator<const char*>(str+sizeof(str)),		input_iterator<const char*>(str+sizeof(str)),
▲ Show 20 Lines • Show All 222 Lines • Show Last 20 Lines

libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_float.pass.cpp

	Show All 15 Lines
	#include <locale>			#include <locale>
	#include <ios>			#include <ios>
	#include <cassert>			#include <cassert>
	#include <streambuf>			#include <streambuf>
	#include <cmath>			#include <cmath>
	#include "test_macros.h"			#include "test_macros.h"
	#include "test_iterators.h"			#include "test_iterators.h"
	#include "hexfloat.h"			#include "hexfloat.h"
				#include "get_float_common.h"

	typedef std::num_get<char, input_iterator<const char*> > F;			typedef std::num_get<char, input_iterator<const char*> > F;

	class my_facet			class my_facet
	: public F			: public F
	{			{
	public:			public:
	explicit my_facet(std::size_t refs = 0)			explicit my_facet(std::size_t refs = 0)
	: F(refs) {}			: F(refs) {}
	};			};


	int main(int, char**)			int main(int, char**)
	{			{
	const my_facet f(1);			const my_facet f(1);
	std::ios ios(0);			std::ios ios(0);
	float v = -1;			float v = -1;

				// Valid floating point formats where whole string is consumed
				TEST("0x123.4f", 8, hexfloat<float>(0x123, 0x4f, 0), ios.eofbit);
				TEST("inf", 3, INFINITY, ios.goodbit \| ios.eofbit);
				TEST("INFINITY", 8, INFINITY, ios.eofbit \| ios.goodbit);

				// Valid floating point formats with unparsed trailing characters
				TEST("123.4f", 5, 123.4f, ios.goodbit);
				TEST("123xyz", 3, 123.0f, ios.goodbit);
				QuuxplusoneUnsubmitted Not Done Reply Inline Actions I'd still like to see the test strings `"INFINITE"` and `"INFINITY"` somewhere in these tests. Also `INAN`, now that I think of it (because it looks like it's gonna be "INF" and then switches to "NAN" in the middle) — and any other "white-box" test cases you can think of. Also every-possible-prefix-of-a-correct string: `0x`, `123e`, `123e+`, `123e-`. (Maybe these are already covered somewhere?) Quuxplusone: I'd still like to see the test strings `"INFINITE"` and `"INFINITY"` somewhere in these tests.
				tmathesonAuthorUnsubmitted Done Reply Inline Actions I'll try to explain what the issue is with adding "intinite" and "infinity" tests -- besides the fact that "infinity" worked before I removed it on request :) On one hand, we have the ideal behaviour, which is that `do_get` handles anything that `strtod` can handle. Then we have the over-specified wording in the standard, which actually defines `do_get` in three stages, and requires us to (paraphrasing) "loop until we reach an invalid character, then stop and pass the substring to `strtod` for parsing." (The actual wording: "a check is made to determine if c is allowed as the next character of an input field of the conversion specifier returned by Stage 1. If so, it is accumulated.") This wording in the standard is probably why we have this slightly bizarre coding style in the first place, with the body of the loop factored out (and the signature added to the ABI). With a garden path input like "infinite" you need to be able to backtrack in stage 2, and reject all the characters that you previously considered valid (e.g. "init") until you get to a valid substring. The parser described in the standard is simply not up to the job. You might think we can err on the side of caution, and accumulate more characters than we need, and let `strtod` use what it wants (e.g. halt stage 2 after accumulating "infinit" and let `strtod` just parse "inf"). But no, it must return zero and give you an error: https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.3.4 As such I can see no way that we can actually meet the following criteria simultaneously: Be compliant with the standard, e.g. stick to the algorithm it describes Handle all valid floating point inputs Always return the same result as `strtod` would The initial problem I wanted to solve is that "1.2f" is badly parsed. This is not a valid floating point number. "1.2" is valid. The "f" should be ignored, i.e. not passed to stage 2. It was not ignored because "f" is a valid hex digit. But the existing code had no way of keeping track of whether hex digits are valid at the current point in the string. I have added this here because it is relatively simple to do without a complete refactor, and it still looks like the standard if you squint. Adding the ability to backtrack to handle garden path inputs is a significant deviation from the standard imo. This change should increase the number of correctly handled cases and hopefully not break any that were handled correctly before, but it is not perfect or complete. tmatheson: I'll try to explain what the issue is with adding "intinite" and "infinity" tests -- besides…
				QuuxplusoneUnsubmitted Not Done Reply Inline Actions I'd still like to see those tests, because otherwise we have codepaths that we're not exercising — which means they could segfault, or worse, for all we know. I see two not-mutually-exclusive alternatives: Figure out the "least common correct denominator" of all implementations, and add a minimal coverage test in test/std/ for it. If the least common denominator is just "we should be able to call it without crashing," then test `try { (void)thething("INFINITY"); } catch (...) {}`. But we should still hit the codepath. And/or, figure out the "current behavior of libc++," and add a regression test in test/libcxx/, so that we test our behavior, and then we'll know if it ever regresses by accident. Because we do care about regressions in this area, right? (That's why you're introducing the ABI flag in the first place.) The responsible thing to do would probably be to add both of these approaches, honestly, now that I've said them out loud. Quuxplusone: I'd still like to see those tests, because otherwise we have codepaths that we're not…
				tmathesonAuthorUnsubmitted Done Reply Inline Actions Sure, I can add something like this to test/std: // FIXME these are all currently known to give the wrong answer TEST("INFINITY", 8, 0.0, ios.failbit); // FIXME it is unclear what the correct answer is in these cases TEST("INAN", 1, 0.0, ios.failbit); TEST("INFINITY", 3, INFINITY, ios.goodbit); tmatheson: Sure, I can add something like this to test/std: ``` // FIXME these are all currently known to…
				TEST("0x123.4+", 7, hexfloat<float>(0x123, 0x4, 0), ios.goodbit);
				// TEST("NININININ", 3, NAN, ios.goodbit);
				// TEST("NANANANAN", 3, NAN, ios.goodbit);

				// Should't recognise e, p or x more than once
				TEST("123.4e-5e-4", 8, 123.4e-5f, ios.goodbit);
				TEST("0x123.4p-5p-4", 10, hexfloat<float>(0x123, 0x4, -5), ios.goodbit);
				TEST("0x123x5", 5, hexfloat<float>(0x123, 0x0, 0), ios.goodbit);

				// Invalid (non-float) inputs
				TEST("a", 0, 0.0f, ios.failbit);
				TEST("e", 0, 0.0f, ios.failbit);
				TEST("f", 0, 0.0f, ios.failbit);
				TEST("p", 0, 0.0f, ios.failbit);
				TEST("M", 0, 0.0f, ios.failbit);
				TEST("{}", 0, 0.0f, ios.failbit);
				TEST("x123", 0, 0.0f, ios.failbit);

				// Incomplete inputs, i.e. eof before finished parsing
				TEST("-", 1, 0.0f, ios.eofbit \| ios.failbit);
				TEST("+", 1, 0.0f, ios.eofbit \| ios.failbit);
				TEST("0x123.4p", 8, 0.0f, ios.eofbit \| ios.failbit);

	{			{
	const char str[] = "123";			const char str[] = "123";
	assert((ios.flags() & ios.basefield) == ios.dec);			assert((ios.flags() & ios.basefield) == ios.dec);
	assert(ios.getloc().name() == "C");			assert(ios.getloc().name() == "C");
	std::ios_base::iostate err = ios.goodbit;			std::ios_base::iostate err = ios.goodbit;
	input_iterator<const char*> iter =			input_iterator<const char*> iter =
	f.get(input_iterator<const char*>(str),			f.get(input_iterator<const char*>(str),
	input_iterator<const char*>(str+sizeof(str)),			input_iterator<const char*>(str+sizeof(str)),
	ios, err, v);			ios, err, v);
	assert(iter.base() == str+sizeof(str)-1);			assert(iter.base() == str+sizeof(str)-1);
	assert(err == ios.goodbit);			assert(err == ios.goodbit);
	assert(v == 123);			assert(v == 123);
	}			}
	{			{
	const char str[] = "-123";			const char str[] = "-123";
	miyukiUnsubmitted Done Reply Inline Actions I think there is a reason behind having many similar-looking blocks: each assert() call is located on a separate line, so an assertion failure message will indicate which assertion failed. Factoring out all checks into a single function will make error messages much less informative. miyuki: I think there is a reason behind having many similar-looking blocks: each assert() call is…
	tmathesonAuthorUnsubmitted Done Reply Inline Actions This was addressed by using a macro for new tests and not refactoring the old tests. tmatheson: This was addressed by using a macro for new tests and not refactoring the old tests.
	std::ios_base::iostate err = ios.goodbit;			std::ios_base::iostate err = ios.goodbit;
	input_iterator<const char*> iter =			input_iterator<const char*> iter =
	f.get(input_iterator<const char*>(str),			f.get(input_iterator<const char*>(str),
	input_iterator<const char*>(str+sizeof(str)),			input_iterator<const char*>(str+sizeof(str)),
	ios, err, v);			ios, err, v);
	assert(iter.base() == str+sizeof(str)-1);			assert(iter.base() == str+sizeof(str)-1);
	assert(err == ios.goodbit);			assert(err == ios.goodbit);
	assert(v == -123);			assert(v == -123);
	▲ Show 20 Lines • Show All 148 Lines • Show Last 20 Lines

libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_float_common.h

This file was added.

				#ifndef GET_FLOAT_COMMON_H
				#define GET_FLOAT_COMMON_H

				/// Read a double from the input string, check that the expected number of
				/// characters are read, the expected value is returned, and the expected
				/// error is set.
				#define TEST(STR, EXPECTED_LEN, EXPECTED_VAL, EXPECTED_ERR) \
				{ \
				std::ios_base::iostate err = ios.goodbit; \
				input_iterator<const char*> iter = f.get( \
				input_iterator<const char*>((STR)), \
				input_iterator<const char*>((STR) + strlen((STR))), ios, err, v); \
				assert(iter.base() == (STR) + (EXPECTED_LEN) && \
				"read wrong number of characters"); \
				assert(err == (EXPECTED_ERR)); \
				if (std::isnan(EXPECTED_VAL)) \
				assert(std::isnan(v) && "expected NaN value"); \
				else \
				assert(v == (EXPECTED_VAL) && "wrong value"); \
				}

				#endif

libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_long_double.pass.cpp

	Show All 15 Lines
	#include <locale>			#include <locale>
	#include <ios>			#include <ios>
	#include <cassert>			#include <cassert>
	#include <streambuf>			#include <streambuf>
	#include <cmath>			#include <cmath>
	#include "test_macros.h"			#include "test_macros.h"
	#include "test_iterators.h"			#include "test_iterators.h"
	#include "hexfloat.h"			#include "hexfloat.h"
				#include "get_float_common.h"

	typedef std::num_get<char, input_iterator<const char*> > F;			typedef std::num_get<char, input_iterator<const char*> > F;

	class my_facet			class my_facet
	: public F			: public F
	{			{
	public:			public:
	explicit my_facet(std::size_t refs = 0)			explicit my_facet(std::size_t refs = 0)
	: F(refs) {}			: F(refs) {}
	};			};


	int main(int, char**)			int main(int, char**)
	{			{
	const my_facet f(1);			const my_facet f(1);
	std::ios ios(0);			std::ios ios(0);
	long double v = -1;			long double v = -1;

				// Valid floating point formats where whole string is consumed
				QuuxplusoneUnsubmitted Not Done Reply Inline Actions Is this ever defined during testing, though? Quuxplusone: Is this ever defined during testing, though?
				TEST("0x123.4f", 8, hexfloat<long double>(0x123, 0x4f, 0), ios.eofbit);
				TEST("inf", 3, INFINITY, ios.goodbit \| ios.eofbit);
				TEST("INFINITY", 8, INFINITY, ios.eofbit \| ios.goodbit);

				// Valid floating point formats with unparsed trailing characters
				TEST("123.4f", 5, 123.4l, ios.goodbit);
				TEST("123xyz", 3, 123.0l, ios.goodbit);
				TEST("0x123.4+", 7, hexfloat<long double>(0x123, 0x4, 0), ios.goodbit);
				// TEST("NININININ", 3, NAN, ios.goodbit);
				// TEST("NANANANAN", 3, NAN, ios.goodbit);

				// Should't recognise e, p or x more than once
				TEST("123.4e-5e-4", 8, 123.4e-5l, ios.goodbit);
				TEST("0x123.4p-5p-4", 10, hexfloat<long double>(0x123, 0x4, -5),
				ios.goodbit);
				TEST("0x123x5", 5, hexfloat<long double>(0x123, 0x0, 0), ios.goodbit);

				// Invalid (non-float) inputs
				TEST("a", 0, 0.0l, ios.failbit);
				TEST("e", 0, 0.0l, ios.failbit);
				TEST("f", 0, 0.0l, ios.failbit);
				TEST("p", 0, 0.0l, ios.failbit);
				TEST("M", 0, 0.0l, ios.failbit);
				TEST("{}", 0, 0.0l, ios.failbit);
				TEST("x123", 0, 0.0l, ios.failbit);

				// Incomplete inputs, i.e. eof before finished parsing
				TEST("-", 1, 0.0l, ios.eofbit \| ios.failbit);
				TEST("+", 1, 0.0l, ios.eofbit \| ios.failbit);
				TEST("0x123.4p", 8, 0.0l, ios.eofbit \| ios.failbit);

	{			{
	const char str[] = "123";			const char str[] = "123";
	assert((ios.flags() & ios.basefield) == ios.dec);			assert((ios.flags() & ios.basefield) == ios.dec);
	assert(ios.getloc().name() == "C");			assert(ios.getloc().name() == "C");
	std::ios_base::iostate err = ios.goodbit;			std::ios_base::iostate err = ios.goodbit;
	input_iterator<const char*> iter =			input_iterator<const char*> iter =
	f.get(input_iterator<const char*>(str),			f.get(input_iterator<const char*>(str),
	input_iterator<const char*>(str+sizeof(str)),			input_iterator<const char*>(str+sizeof(str)),
	▲ Show 20 Lines • Show All 222 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[locale][num_get] Improve Stage 2 of string to float conversionAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 334122

libcxx/include/locale

libcxx/src/locale.cpp

libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_double.pass.cpp

libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_float.pass.cpp

libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_float_common.h

libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_long_double.pass.cpp

[locale][num_get] Improve Stage 2 of string to float conversion
AbandonedPublic