This is an archive of the discontinued LLVM Phabricator instance.

Get all localization tests passing on linux!
AbandonedPublic

Authored by EricWF on Aug 11 2014, 9:33 PM.

Details

Summary

This patch gets all localization tests passing on linux.
It doesn't actually change any code in locale. Only tests.
Most of the failures were due to different locale behavior between apple and linux.
Other failures seem to be related to bugs in GLIBCs handling of some characters.

Diff Detail

Event Timeline

EricWF updated this revision to Diff 12383.Aug 11 2014, 9:33 PM
EricWF retitled this revision from to Get all localization tests passing on linux!.
EricWF updated this object.
EricWF edited the test plan for this revision. (Show Details)
EricWF added reviewers: mclow.lists, danalbert.
EricWF set the repository for this revision to rL LLVM.
EricWF added a subscriber: Unknown Object (MLST).
danalbert edited edge metadata.Aug 14 2014, 8:43 PM

I think I may have misled you when I said we should #ifdef the differences between glibc and Mac. If there are legitimate differences, we should #ifdef them. If glibc is wrong (it looks like it often is), we should just XFAIL the test and file a bug against glibc (or does that data come from an OS package?).

I usually only commented on one test of a given type, but the comments apply to all similar.

test/localization/locale.categories/category.ctype/locale.ctype.byname/narrow_1.pass.cpp
11 ↗(On Diff #12383)

Doh! My mistake.

test/localization/locale.categories/category.ctype/locale.ctype.byname/tolower_1.pass.cpp
38

I think you meant to #ifdef this one as you did in the next test.

Either way, this doesn't look right to me. Do we know why this is happening? Is this a legitimate difference between Mac and glibc, or is glibc broken? If it's just a difference, then we should still check against the expected glibc behavior. If Linux is broken, we should XFAIL the test rather than skipping the broken part.

test/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp
11

We're XFAILing these as a TODO, right? I wish LIT had a way to distinguish between intended failures and known failures.

test/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_zh_CN.pass.cpp
311

I think glibc's locale data is wrong here. We should mark these tests as XFAIL rather than hiding the failure. A bug will need to be filed against glibc (or wherever glibc gets their locale data from).

test/localization/locale.categories/category.monetary/locale.moneypunct.byname/decimal_point.pass.cpp
133

The test is right, glibc's locale data is wrong. Russia uses ',' for a decimal separator (according to wikipedia). This test (and those like it) should be XFAIL.

test/localization/locale.categories/category.time/locale.time.get.byname/get_date.pass.cpp
63

Looks like more bad locale data to me.

79

What?

92

Looks like it wouldn't be the first time :)

test/localization/locale.categories/category.time/locale.time.get.byname/get_date_wide.pass.cpp
86

Then let's XFAIL it for now.

test/localization/locale.categories/category.time/locale.time.get.byname/get_one.pass.cpp
47

There should be an #else case that tests the Linux behavior.

70

Pretty sure this is more bad locale data.

117

Phabricator: "Context not available"
Me: Damn you, Phabricator!

*checks the source*

Well, that didn't make it any clearer. I presume that mess of hex is encoding cyrillic characters. Are these tests like the other Russian ones where the separators are wrong? If so, XFAIL, not #ifdef.

test/localization/locale.categories/category.time/locale.time.put.byname/put1.pass.cpp
73

What's with the case differences? I see that was actually already the case, but why?

test/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp
60

Looks like more glibc wrongness.

Hi Dan,

Thanks for the review. I'll follow your suggestions. However I want to ask a question. Since the glibc data is frequently wrong we sacrifice a lot of test coverage on linux by marking the tests
XFAIL. Is there a point at which test coverage should trump glibc wrongness?

Hi Dan,

I've addressed most of your comments. Thanks for the input.
I'll post a revised patch closely after.

test/localization/locale.categories/category.ctype/locale.ctype.byname/tolower_1.pass.cpp
38

FreeBSD also fails on these tests as well for the same reason and I suspect darwin does as well. This needs more investigation.

test/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp
11

Yep. I'll add a comment saying its TODO and explain the likely cause (translation of U00A0).

test/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_zh_CN.pass.cpp
311

GLIBC's locale definition doesn't seem that unreasonable. I'll look more into it but I don't know if we can say its test data is for sure wrong.
I'm going to hold off on changing this.

test/localization/locale.categories/category.monetary/locale.moneypunct.byname/decimal_point.pass.cpp
133

Yep. That seems wrong. I'll back out the GLIBC changes and just mark it as XFAIL. I'll also file a bug with GLIBC.

test/localization/locale.categories/category.time/locale.time.get.byname/get_date.pass.cpp
63

I think this might be bad locale data on apples part as opposed to GLIBCs. I'm basing that off
http://www.localeplanet.com/icu/fr-FR/
http://en.wikipedia.org/wiki/Date_format_by_country

79

GLIBC's locale data says that U002E is used as the separator.
http://www.fileformat.info/info/unicode/char/2e/index.htm

I don't think that seems unreasonable. More investigation needed though.

test/localization/locale.categories/category.time/locale.time.get.byname/get_date_wide.pass.cpp
86

Ok. Done.

test/localization/locale.categories/category.time/locale.time.get.byname/get_one.pass.cpp
47

Not quite sure how to write one. The output depends on the timezone your in.

70

Probably, but I think GLIBC is correct. Having 12 hour time for en_US seems to make more sense than 24 hour.

117

Done. Changed backed out. I suspect the difference is between upper and lower case cyrillic characters. However more investigation is needed.

test/localization/locale.categories/category.time/locale.time.put.byname/put1.pass.cpp
73

Not sure. glibc uses lower case. I've seperated the glibc change and only check the lower case version.

test/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp
60

Perhaps. Some of the sources I've found online suggest that ' ' is valid.
http://docs.oracle.com/cd/E19455-01/806-0169/overview-9/index.html
I'll leave this in for now.

EricWF updated this revision to Diff 12596.Aug 17 2014, 5:03 PM
EricWF edited edge metadata.

Implement most of Dan's comments. I've also backed out any changes to ctype tests. They need more investigation. They are just marked xfail now.

jroelofs added inline comments.
test/localization/locale.categories/category.time/locale.time.get.byname/get_one.pass.cpp
117

@danalbert: the "Context not available" bit isn't phabricator's fault. This happens when a patch is made without '-U999'.

test/localization/locale.categories/category.time/locale.time.put.byname/put1.pass.cpp
73

The lowercase versions of that abbreviation are suspect to me. IIRC French folks capitalize it always.

emaste added a subscriber: emaste.Aug 18 2014, 11:41 AM
danalbert added inline comments.Aug 18 2014, 5:16 PM
test/localization/locale.categories/category.ctype/locale.ctype.byname/tolower_1.pass.cpp
38

Okay. Still, if it needs more investigation then it needs more investigation, but we definitely shouldn't be commenting out anything.

test/localization/locale.categories/category.time/locale.time.get.byname/get_date.pass.cpp
79

But 0x2e == '.'

test/localization/locale.categories/category.time/locale.time.get.byname/get_one.pass.cpp
70

Yeah, that's strange. Does this test fail on Mac? If it doesn't, it probably should.

Thanks for the comments. It seems not all of the changes made it into the last patch. I'll implement your advice and back out all of the categories.ctype tests and just mark them XFAIL.

test/localization/locale.categories/category.ctype/locale.ctype.byname/tolower_1.pass.cpp
38

Yeah, Sorry I thought I took that out and just marked it XFAIL.

test/localization/locale.categories/category.time/locale.time.get.byname/get_date.pass.cpp
79

I must have been tired. Thanks for the correction. I'll see what up with these tests.

test/localization/locale.categories/category.time/locale.time.get.byname/get_one.pass.cpp
70

I'll double check on this. I think this test passes on mac.

It might be worth splitting this up into a couple patches. I think all the XFAILs can probably be submitted (just the XFAIL line, not the rest of the changes to the file). It would clean up a lot of the test results so you can have less noise tracking down the others, as well as making the next update easier to review.

EricWF abandoned this revision.Aug 20 2014, 8:07 PM

Splitting this revision up into smaller parts.