This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] [test] Fix the classic_table test on Windows
ClosedPublic

Authored by mstorsjo on Mar 2 2022, 1:29 AM.

Details

Summary

Whether we can check for e.g. F::alpha and F::print corresponds
to the _LIBCPP_CTYPE_MASK_IS_COMPOSITE_PRINT and
_LIBCPP_CTYPE_MASK_IS_COMPOSITE_ALPHA defines - but we can't check
those in a standard test.

Diff Detail

Event Timeline

mstorsjo created this revision.Mar 2 2022, 1:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 1:29 AM
mstorsjo requested review of this revision.Mar 2 2022, 1:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 1:29 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript

IMHO this test starts out really bad, but these changes don't improve it in the right direction. I'd suggest either
(1) leave it XFAIL'ed for Windows with no FIXME, or
(2) completely rewrite it in the form

#ifdef _WIN32
const mask expected_table[] = {
    [...]
};
#else
const mask expected_table[] = {
    0x00200, 0x00200, 0x00200, 0x00200, 0x00200, 0x00200, 0x00200, 0x00200, 0x00200, 0x24200, 0x04200, 0x04200, 0x04200, 0x04200, 0x00200, 0x00200,
    0x00200, 0x00200, 0x00200, 0x00200, 0x00200, 0x00200, 0x00200, 0x00200, 0x00200, 0x00200, 0x00200, 0x00200, 0x00200, 0x00200, 0x00200, 0x00200,
    0x64000, 0x42800, 0x42800, 0x42800, 0x42800, 0x42800, 0x42800, 0x42800, 0x42800, 0x42800, 0x42800, 0x42800, 0x42800, 0x42800, 0x42800, 0x42800,
    0x50c00, 0x50c01, 0x50c02, 0x50c03, 0x50c04, 0x50c05, 0x50c06, 0x50c07, 0x50c08, 0x50c09, 0x42800, 0x42800, 0x42800, 0x42800, 0x42800, 0x42800,
    0x42800, 0x5890a, 0x5890b, 0x5890c, 0x5890d, 0x5890e, 0x5890f, 0x48900, 0x48900, 0x48900, 0x48900, 0x48900, 0x48900, 0x48900, 0x48900, 0x48900,
[...]
    0x00000, 0x00000, 0x00000, 0x00000, 0x00000, 0x00000, 0x00000, 0x00000, 0x00000, 0x00000, 0x00000, 0x00000, 0x00000, 0x00000, 0x00000, 0x00000,
};
#endif
assert(F::table_size == 256);
assert(std::equal(p, p + 256, expected_table));

(3) What about z/OS and EBCDIC? I bet @NancyWang2222 and @SeanP are eventually going to get around to this test, and then it's going to have to undergo surgery again. So we should plan ahead for that, too.

IMHO this test starts out really bad, but these changes don't improve it in the right direction. I'd suggest either
(1) leave it XFAIL'ed for Windows with no FIXME, or
(2) completely rewrite it
(3) What about z/OS and EBCDIC? I bet @NancyWang2222 and @SeanP are eventually going to get around to this test, and then it's going to have to undergo surgery again. So we should plan ahead for that, too.

Yes, this isn't really pretty... FWIW, MS STL also has slightly different values for all these (and has a slightly different literal values for C++ ctype constants too). So this test right now tests way, way more than what the C++ standard would imply, I think.

So should we reduce the test down to first just minimal testing based on what the standard implies (call the function to make sure it doesn't crash, then check that e.g. uppercase/lowercase ASCII 'A' is flagged correctly - although I guess we can't assume that either, for EBCDIC), and then for specific platforms and/or C++ library implementations, do the full check like we do here?

mstorsjo updated this revision to Diff 412541.Mar 2 2022, 2:18 PM

Add simpler, less stringent checking of properties for a handful of characters, as a test that should pass on any C++ standard library. (Now it passes on MS STL too.) On platforms other than Windows, do the preexisting, stringent check of all aspects of all entries (which were passing on all other CI platforms before).

Quuxplusone added inline comments.Mar 4 2022, 8:34 AM
libcxx/test/std/localization/locale.categories/category.ctype/facet.ctype.special/facet.ctype.char.statics/classic_table.pass.cpp
29–33

This seems like an improvement over the status quo, so if you want to stop here, I'm OK with that.
But maybe this test should just duplicate all the relevant logic — normally an antipattern, maybe still an antipattern, but...—

// in a loop for c from 0 to 255 or whatever
bool expect_cntrl = (c <= 31) || (127 <= c);
bool expect_lower = ('a' <= c && c <= 'z');
bool expect_print = (32 <= c && c <= 126);
bool expect_punct = (33 <= c && c <= 47) || (58 <= c && c <= 64) || (91 <= c && c <= 96) || (123 <= c && c <= 126);
bool expect_upper = ('A' <= c && c <= 'Z');

assert(bool(p[int(c)] & F::alpha) == (expect_upper || expect_lower));
assert(bool(p[int(c)] & F::lower) == expect_lower);
assert(bool(p[int(c)] & F::print) == expect_print);
assert(bool(p[int(c)] & F::punct) == expect_punct);
~~~

And then for Win32 you just #if the bool expect_blank = case to something different; and for z/OS eventually, we just #if all the expect_s to other things.

110–112

Does this imply that std::isblank('\n') is true on Windows? According to cppreference, that's a bug: "In the default C locale, only space (0x20) and horizontal tab (0x09) are classified as blank characters."

mstorsjo updated this revision to Diff 413091.Mar 4 2022, 12:02 PM

Updated as suggested; use bool expected_foo and check only that bool(p[i] & F::foo) == expected_foo, in general. For non-Windows, we can keep checking that all bits are set exactly as we want them.

Skip checking the F::blank bit for i < 14, it isn't set quite as expected.

mstorsjo added inline comments.Mar 4 2022, 12:11 PM
libcxx/test/std/localization/locale.categories/category.ctype/facet.ctype.special/facet.ctype.char.statics/classic_table.pass.cpp
29–33

Ok, I did something like that, please take a look.

110–112

No, ::isblank('\n') does return 0 (MS STL doesn't seem to have a single-argument std::isblank())

In MS STL, F::blank is actually a mask consisting of the C runtime level _BLANK and _SPACE, but the isblank() function probably only looks at the _BLANK bit. See https://github.com/microsoft/STL/blob/main/stl/inc/xlocale#L2329-L2353.

And for the horizontal tab character, the raw table is missing the blank bit, but the isblank() function works around it: https://github.com/microsoft/win32metadata/blob/master/generation/WinSDK/RecompiledIdlHeaders/ucrt/ctype.h#L201

Thus, the high level stuff seems to work pretty reasonably, but the raw F::classic_table() table and bitmasks are so-so.

Quuxplusone added a reviewer: ldionne.

LGTM%comment, but I'll ask for a second approver too.

libcxx/test/std/localization/locale.categories/category.ctype/facet.ctype.special/facet.ctype.char.statics/classic_table.pass.cpp
29–33

Throughout, please prefer the (lo <= x && x <= hi) style I used above.
https://quuxplusone.github.io/blog/2021/05/05/its-clamping-time/
Personally I wouldn't do the space-alignment in expect_punct, but I guess I don't really care.

For the _WIN32 codepath, I'd prefer you put the ifdef around bool expect_blank =, like

#ifdef _WIN32
bool expect_blank = (9 <= i && i <= 13) || (i == 32);
#else
bool expect_blank = (i == '\t' || i == ' ');
#endif
mstorsjo added inline comments.Mar 6 2022, 1:22 AM
libcxx/test/std/localization/locale.categories/category.ctype/facet.ctype.special/facet.ctype.char.statics/classic_table.pass.cpp
29–33

Ok, I can change the formatting of those conditions.

Regarding the blank bit on Windows, I explicitly meant to skip the check entirely for a couple characters, to make the test loose enough that MS STL passes too (your suggestion passes with our code but not with theirs).

On MS STL, the F::blank value is set to _BLANK | _SPACE, while we have it set to just _BLANK. So to cope with that, I’d just skip checking that flag for the lower chars.

I could also skip checking expect_blank on Windows altogether, concluding that there’s some mess with that bit. Or just ignore the MS STL case and adjust the expectation for i == '\t’ which is enough for us and test expect_blank for all chars.

Quuxplusone added inline comments.
libcxx/test/std/localization/locale.categories/category.ctype/facet.ctype.special/facet.ctype.char.statics/classic_table.pass.cpp
29–33

As long as the situation is deterministic, my preference is almost always to keep the test coverage. It sounds like in this case you should just add a new #if, like

#ifdef TEST_COMPILER_MSVC
bool expect_blank = (????);
#elif defined(_WIN32)
bool expect_blank = (9 <= i && i <= 13) || (i == 32);
#else
bool expect_blank = (i == '\t' || i == ' ');
#endif

Or, if that wouldn't deterministically pass for some reason (like, the behavior changes between different versions of MSVC and the buildbot can't tell what version it's running, or if we have intel that MSVC is actively working on fixing this), then at least we should XFAIL: msvc this whole test, and then if it ever starts passing we could re-investigate. (And by "we" I assume I really mean @CaseyCarter. :))

mstorsjo added inline comments.Mar 6 2022, 8:38 AM
libcxx/test/std/localization/locale.categories/category.ctype/facet.ctype.special/facet.ctype.char.statics/classic_table.pass.cpp
29–33

Ok, sure, that’s doable! And it turns out it ends up less ugly than earlier attempts too.

mstorsjo updated this revision to Diff 413302.Mar 6 2022, 8:38 AM

Explicitly spell out the expectation for MS STL, libc++ on Windows and others.

Quuxplusone added inline comments.Mar 7 2022, 2:15 PM
libcxx/test/std/localization/locale.categories/category.ctype/facet.ctype.special/facet.ctype.char.statics/classic_table.pass.cpp
33

(nit)

35–36

I'm OK with this, but personally I would have spelled out

// MS STL includes the _SPACE bit in F::blank
bool expect_blank = (9 <= i && i <= 13) || (i == ' ');

because it's cheap.

56

Stray } at the end of the comment?
Over-helpful IDE closing your '{ for you? ;)

69–88

I thought the goal was still to get rid of this big #ifndef block. And I think you can get rid of line 92 — it's now redundant with lines 64–73, right?

Would it be correct on all platforms to replace line 93 with something like this? If not, why not?

const mask defined_bits = (F::cntrl | F::print | F::space | F::blank | F::lower | F::upper | F::alpha | F::digit | F::xdigit | F::punct);
assert((p[i] & defined_bits) == 0);  // no bits are set outside the mask
mstorsjo added inline comments.Mar 7 2022, 2:26 PM
libcxx/test/std/localization/locale.categories/category.ctype/facet.ctype.special/facet.ctype.char.statics/classic_table.pass.cpp
33

Right, thanks

56

Actually, there might have been a helpful IDE for whoever wrote this originally, I'm just bringing the typo along from the original version :P Will fix.

69–88

The new tests are a more lenient way of checking these features; this enforces more aspects of it - on platforms where we want to assume that much.

No, line 92 is not redundant with the other conditions. Lines 64-73 check that for each constant (which may be one or more bits), we either have some bits set (if expected) or none of the bits set. Line 92 checks that every single bit in the expected bitmasks are set.

So for a platform with F::alpha = F::lower | F::upper, you'd get a failure because while we expect F::alpha to be set, we don't expect _all_ bits of F::alpha to be set.

Regarding line 93, no, this doesn't check what you suggest. Previously, this checked that we _don't_ have bits set that we don't expect. But actually, our lines 64-73 checks that aspect already, so we can get rid of this one.

mstorsjo updated this revision to Diff 413635.Mar 7 2022, 2:32 PM

Applied the suggestions

mstorsjo added inline comments.Mar 7 2022, 2:33 PM
libcxx/test/std/localization/locale.categories/category.ctype/facet.ctype.special/facet.ctype.char.statics/classic_table.pass.cpp
69–88

But on second thought, while it is a stricter check (so we don't lose any test strictness on existing supported platforms), I'm not sure that it adds much value to check that all bits of all the features are set either. So I would be totally ok with removing this ifdef altogether too.

mstorsjo updated this revision to Diff 413817.Mar 8 2022, 8:25 AM

Removed the old, overly strict ifdef block.

Mordante accepted this revision.Mar 8 2022, 10:54 AM

IMHO this test starts out really bad, but these changes don't improve it in the right direction. I'd suggest either
(1) leave it XFAIL'ed for Windows with no FIXME, or
(2) completely rewrite it
(3) What about z/OS and EBCDIC? I bet @NancyWang2222 and @SeanP are eventually going to get around to this test, and then it's going to have to undergo surgery again. So we should plan ahead for that, too.

Yes, this isn't really pretty... FWIW, MS STL also has slightly different values for all these (and has a slightly different literal values for C++ ctype constants too). So this test right now tests way, way more than what the C++ standard would imply, I think.

So should we reduce the test down to first just minimal testing based on what the standard implies (call the function to make sure it doesn't crash, then check that e.g. uppercase/lowercase ASCII 'A' is flagged correctly - although I guess we can't assume that either, for EBCDIC), and then for specific platforms and/or C++ library implementations, do the full check like we do here?

I'm not even sure the test will pass at all at EBCDIC at all, with the ranges used. Specifically expect_punct seems to expect ASCII. However I think when we want to support EBCDIC we either need a large #if or create a separate test for EBCDIC. At the moment we don't have a CI to test EBCDIC. I don't see that as a blocker for this patch.

LGTM!

This revision is now accepted and ready to land.Mar 8 2022, 10:54 AM
This revision was automatically updated to reflect the committed changes.