This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] [test] Fix the collate compare test for Glibc, Windows and FreeBSD
ClosedPublic

Authored by mstorsjo on Mar 2 2022, 1:21 AM.

Details

Summary

The old expected behaviour was specific to Apple platforms, while Glibc, Windows and FreeBSD collate differently (ignoring case). Make the old tested behaviour a special case for Apple platforms, and make the default case the one used by the other three.

In clang-cl/DLL configurations, the test is hit by https://llvm.org/PR41018 (making the test fail to link).

Diff Detail

Event Timeline

mstorsjo created this revision.Mar 2 2022, 1:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 1:21 AM
mstorsjo requested review of this revision.Mar 2 2022, 1:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 1:21 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Quuxplusone requested changes to this revision.Mar 2 2022, 5:42 AM
Quuxplusone added inline comments.
libcxx/test/std/localization/locale.categories/category.collate/locale.collate.byname/compare.pass.cpp
21–24

Both glibc and Win32 have a different collation order from libc++-on-Apple. Are they the same collation order? Is this just a bug on Apple, perhaps?

That said, lines 47-62 totally reasonable to me. They're using std::collate in the en_US.UTF-8 locale to compare "aaaaaaA" against "BaaaaaA", and expecting "B" to collate after "a" in dictionary order. That seems totally fine and correct to me.
Lines 66-81 look fishy. They're doing the same dictionary-order test, but in the C locale! Isn't the C locale supposed to use basically strcmp to do its collation? B should sort before a in strcmp order. I suspect that perhaps this test should be flipped, enabled on Linux and XFAILed (if necessary) on Apple targets.

This revision now requires changes to proceed.Mar 2 2022, 5:42 AM
mstorsjo updated this revision to Diff 412510.Mar 2 2022, 12:30 PM

Rewrote the test to make it pass on Glibc and Windows (and also tested on FreeBSD, even if that's not available in CI).

mstorsjo retitled this revision from [libcxx] [test] Explain and clarify a Windows XFAIL to [libcxx] [test] Fix the collate compare test for Glibc, Windows and FreeBSD.Mar 2 2022, 12:31 PM
mstorsjo edited the summary of this revision. (Show Details)

I'm not 100% happy with the wording of the comments "Apple seems to collate lexicographically" and "Glibc, Windows and FreeBSD collate ignoring case here" — I think "lexicographically" is ambiguous. Lexicographical order unambiguously describes how to order variable-length sequences of elements given that you know how to compare two elements, but it doesn't imply anything about the way in which you compare those elements. A dictionary that sorts "run, Runescape, runt" is still lexicographically ordered; it just happens to collate r and R together.

How about "Apple's default collation is case-sensitive" and "Glibc, Windows, and FreeBSD's default collation is case-insensitive"?

How about "Apple's default collation is case-sensitive" and "Glibc, Windows, and FreeBSD's default collation is case-insensitive"?

Sure, that sounds good to me - I'll update my local copy of the patch that way.

ldionne accepted this revision.Mar 2 2022, 12:54 PM
ldionne added a subscriber: ldionne.

It's kind of crazy that different C libraries have such fundamental differences.

This revision is now accepted and ready to land.Mar 2 2022, 12:54 PM
Mordante accepted this revision.Mar 2 2022, 11:25 PM

It's kind of crazy that different C libraries have such fundamental differences.

Agreed, this isn't what I expect of locales :-( But since this is how libc++ get's the information LGTM.

EricWF added a subscriber: EricWF.Mar 10 2022, 2:30 PM
EricWF added inline comments.
libcxx/test/std/localization/locale.categories/category.collate/locale.collate.byname/compare.pass.cpp
36

The indirection here makes it harder to understand what the test is actually doing. For example, the actual object under test, the facet, isn't visibly used at the callsites.
"Don't repeat yourself" might be a good maxim most places, but it's totally fine for tests to contain repeated code.