This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] [Windows] Pick a unique bit for __regex_word
ClosedPublic

Authored by mstorsjo on Jan 25 2022, 2:00 PM.

Details

Summary

The old __regex_word aliased the mask for xdigit, causing stray
test failures.

The diff may look surprising, as if the previous faulty value had
been set specifically for Windows - but this is due to a restructuring
in 411c630bae0e0d50697651797709987e2cfea92d. Prior to that, there
were OS specific settings for some OSes, and one fallback used for
the rest (which turns out to not work for Windows).

Diff Detail

Event Timeline

mstorsjo requested review of this revision.Jan 25 2022, 2:00 PM
mstorsjo created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2022, 2:00 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Quuxplusone added inline comments.
libcxx/include/__locale
554

Could we realistically put a future-proofing static_assert down here, something like

static_assert(__regex_word & ~(space | print | cntrl | upper | lower | alpha | digit | punct | xdigit | blank) == __regex_word);

and I would expect

static_assert(upper | lower == alpha);

also? And/or, could we make a test for this?

mstorsjo added inline comments.Jan 25 2022, 2:25 PM
libcxx/include/__locale
554

I guess such a static_assert for __regex_word would be good to have, yes.

static_assert(upper | lower == alpha); doesn't hold on Windows, _ALPHA is defined as 0x0100 | _UPPER | _LOWER actually.

mstorsjo updated this revision to Diff 403188.Jan 26 2022, 2:49 AM

Added a static assert.

Looks like the static assert is failing on AIX, indicating that the same issue is present there. CC @DanielMcIntosh-IBM @muiez

Would it be ok to land this without the static assert, fixing the issue for one platform at least? Then when the AIX case gets fixed, we can add the static assert. Also CC @daltenty for more AIX people.

muiez added inline comments.Jan 28 2022, 8:23 AM
libcxx/include/__locale
554

On z/OS, the following static_assert passes:

static_assert((__regex_word & ~(space | print | cntrl | upper | lower | alpha | digit | punct | xdigit | blank)) == __regex_word);

However, this one fails:

static_assert((upper | lower) == alpha);

Note: we should place parentheses around the | expression to evaluate it first since == has a higher precedence.

mstorsjo added inline comments.Jan 28 2022, 8:34 AM
libcxx/include/__locale
554

Yep, I added parentheses in that form in the updated patch. But that assert fails on AIX in the precommit testing.

The second assert fails on windows too, so I didn’t include it in the patch.

Would it be ok to land this without the static assert, fixing the issue for one platform at least? Then when the AIX case gets fixed, we can add the static assert.

FWIW, I'd prefer to land a static_assert now so we don't forget about it; but guard it under #ifndef _AIX or XFAIL: LIBCXX-AIX-FIXME or something like that (depending on where you put the assert), so that it's greppable for people working on AIX specifically.

Would it be ok to land this without the static assert, fixing the issue for one platform at least? Then when the AIX case gets fixed, we can add the static assert.

FWIW, I'd prefer to land a static_assert now so we don't forget about it; but guard it under #ifndef _AIX or XFAIL: LIBCXX-AIX-FIXME or something like that (depending on where you put the assert), so that it's greppable for people working on AIX specifically.

Oh, right, I could move the assert to a test - I placed it in the header so far, where it broke everything.

Would it be ok to land this without the static assert, fixing the issue for one platform at least? Then when the AIX case gets fixed, we can add the static assert.

FWIW, I'd prefer to land a static_assert now so we don't forget about it; but guard it under #ifndef _AIX or XFAIL: LIBCXX-AIX-FIXME or something like that (depending on where you put the assert), so that it's greppable for people working on AIX specifically.

Oh, right, I could move the assert to a test - I placed it in the header so far, where it broke everything.

Even in the header, it could be guarded by #ifndef _AIX. (That said, a test sounds good too. It'd just have to be in libcxx/test/libcxx/, right? because __regex_word is an implementation detail of libc++. Unless there's a way to regression-test the observable bug that was originally caused by this issue; that would be excellent too.)

mstorsjo updated this revision to Diff 404081.Jan 28 2022, 10:23 AM

Moved the assert into a test, temporarily.

mstorsjo updated this revision to Diff 404257.Jan 29 2022, 5:19 AM

Retry/rerun CI (which got failed on unrelated things last time)

mstorsjo updated this revision to Diff 404261.Jan 29 2022, 6:52 AM

Add a message to the static assert, hopefully fixing building in pre-C++17 modes.

libcxx/test/std/re/re.traits/lookup_classname.pass.cpp
33

(1) Couldn't this go in include/__locale today, just guarded with #if !defined(_AIX) and a TODO comment?
(2) I don't think this can go in libcxx/test/std/ because __regex_word isn't a standard name. It should go in libcxx/test/libcxx/ (unless you just put it in include/__locale).

mstorsjo added inline comments.Jan 29 2022, 9:12 AM
libcxx/test/std/re/re.traits/lookup_classname.pass.cpp
33

Yeah, that should work too, and good point about this being a standard test. (I went with placing it in a test instead of in the header, to break less if there’s other platforms where it doesn’t pass, that aren’t covered by CI. But if it isn’t covered by CI, we can’t give any promises I guess.)

mstorsjo updated this revision to Diff 404287.Jan 29 2022, 10:03 AM

Moved the static assert back to the header, as suggested.

LGTM now!

@daltenty: would it be useful for someone to file a GitHub issue to remember to fix this on AIX? ...well, I guess the XFAIL: LIBCXX-AIX-FIXME on the relevant tests may be good enough. Anyway, I'll leave it to the AIX people to figure out what they want to do.

ldionne accepted this revision.Jan 31 2022, 12:34 PM
This revision is now accepted and ready to land.Jan 31 2022, 12:34 PM
This revision was automatically updated to reflect the committed changes.

This is failing for Newlib / Picolibc too (defining _NEWLIB_VERSION).

As far as I can tell, the mask there is a char and all 8 bits are currently defined in newlib:

#define _U      01
#define _L      02
#define _N      04
#define _S      010
#define _P      020
#define _C      040
#define _X      0100
#define _B      0200

What would be the proper fix ?

This is failing for Newlib / Picolibc too (defining _NEWLIB_VERSION).

As far as I can tell, the mask there is a char and all 8 bits are currently defined in newlib:

#define _U      01
#define _L      02
#define _N      04
#define _S      010
#define _P      020
#define _C      040
#define _X      0100
#define _B      0200

What would be the proper fix ?

Hmm, that looks problematic. I think you'd have to enlargen the mask type, which I would guess breaks the libc++ ABI for that platform - and I don't know if that one needs to match some size used by the libc too.