This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Correctly propagate user-defined lookup_classname().
Needs ReviewPublic

Authored by timshen on Sep 17 2017, 12:04 AM.

Details

Summary

Always lookup the class name, even when the traits type is regex_traits<>.
The lookup happens at regex compile time, so it shouldn't affect the performance.

I also added ja_JP.UTF-8 as a common locale.

This patch fixes PR18501.

Event Timeline

timshen created this revision.Sep 17 2017, 12:04 AM

I'm not sure if we need to change the CI server to suport ja_JP.UTF-8.

timshen updated this revision to Diff 115559.Sep 17 2017, 12:06 AM

Update description.

timshen edited the summary of this revision. (Show Details)Sep 17 2017, 12:06 AM
timshen updated this revision to Diff 115560.Sep 17 2017, 12:08 AM

Stylize template decl to "template <class ...".

Quuxplusone added inline comments.
libcxx/test/std/re/re.traits/lookup_classname_user_defined.pass.cpp
47

Could you add a test here for

std::wstring re3 = L"([[:ALNUM:]]+)";
std::regex_search(in, m, std::wregex(re3, std::regex_constants::icase));

std::wstring re4 = L"(\\W+)";
std::regex_search(in, m, std::wregex(re4, std::regex_constants::icase));

documenting the expected outputs? It's unclear to me from cppreference
http://en.cppreference.com/w/cpp/regex/regex_traits/lookup_classname
whether lookup_classname("W") is supposed to produce a result or not (but you seem to assume it does).

My understanding is that the "icase" parameter to lookup_classname is talking about the icaseness of the regex matcher; classnames should always be matched with exact case, i.e. [[:alnum:]] is always a valid classname and [[:ALNUM:]] is always invalid, regardless of regex_constants::icase. But I'm not sure.

timshen updated this revision to Diff 115597.Sep 17 2017, 11:14 PM

Propagate __icase_ correctly into lookup_classname.

timshen added inline comments.Sep 17 2017, 11:17 PM
libcxx/test/std/re/re.traits/lookup_classname_user_defined.pass.cpp
47

[re.req] says that, for lookup_classname(), "The value returned shall be independent of the case of the characters in the sequence."

I take it as regardless of lookup_classname()'s icase argument, [[:ALNUM:]] is always valid.

There are existing tests that confirms it in std/re/re.traits/lookup_classname.pass.cpp. Search for "AlNum".

I fixed my patch, since I was misunderstanding it as well (I thought icase is for the input char sequence). Now they are just forwarded into lookup_classname().

libcxx/test/std/re/re.traits/lookup_classname_user_defined.pass.cpp
47

[re.req] says that, for lookup_classname(), "The value returned shall be independent of the case of the characters in the sequence."

Huh. That seems like a defect in the Standard, since although lookup_classname() is parameterized on a locale, there is no standard way to get case-insensitive string comparison out of an arbitrary locale AFAIK. However, that's a problem for the implementer of lookup_classname(), not for you. Forwarding straight to lookup_classname() and letting it deal with questions of "case" sounds like the right approach to me.

If you *wanted* to go down this rabbit hole, a good test case would be "[[:DIGIT:]]" in Turkish locale (where lowercasing "[[:DIGIT:]]" produces "[[:dıgıt:]]" not "[[:digit:]]").

(Note— The only place I find "case-insensitive" in N4659 is in the informative note on time_get::get: "It is unspecified by what means the function performs case-insensitive comparison or whether multi-character sequences are considered while doing so." —end note)