This is an archive of the discontinued LLVM Phabricator instance.

[libcxx][pretty printers] Correct locale for u16/u32 string tests
ClosedPublic

Authored by DavidSpickett on Oct 5 2021, 5:29 AM.

Details

Summary

When the locale is not some UTF-8 these tests fail.
(different results for python2 linked gdbs vs. python3
but same issue)

Setting the locale just for the test works around this.
By default Ubuntu comes with just C.UTF-8. I've chosen
to use en_US.UTF-8 instead given that my Mac doesn't have
the former and there's a slim chance this test might run there.

This also enables the u16string tests which are now passing.

Diff Detail

Event Timeline

DavidSpickett created this revision.Oct 5 2021, 5:29 AM
DavidSpickett requested review of this revision.Oct 5 2021, 5:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 5 2021, 5:30 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript

You can reproduce by building the CI docker image with Focal instead of Bionic.

The default gdb 9.2 with python 3.8

FAIL: /llvm/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp:196
GDB printed:
   'u"\\xd800\\xdd96\\x20ac\\243$"'
Value should match:
   'u"𐆖€£$"'
PASS: /llvm/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp:198
PASS: /llvm/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp:200
PASS: /llvm/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp:206
FAIL: /llvm/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp:209
GDB printed:
   'U"\\x1d552\\x1d553\\x1d554\\x1d555\\x1d556\\x1d557"'
Value should match:
   'U"𝕒𝕓𝕔𝕕𝕖𝕗"'
FAIL: /llvm/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp:213
GDB printed:
   'U"\\x4f60\\x597d"'
Value should match:
   'U"你好"'

A gdb-10.1 using python 2.7.18 that I built myself:

FAIL: /llvm/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp:196
GDB printed:
   u'u"\\xd800\\xdd96\\x20ac\\243$"'
Value should match:
   u'u"\U00010196\u20ac\xa3$"'
PASS: /llvm/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp:198
PASS: /llvm/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp:200
PASS: /llvm/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp:206
FAIL: /llvm/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp:209
GDB printed:
   u'U"\\x1d552\\x1d553\\x1d554\\x1d555\\x1d556\\x1d557"'
Value should match:
   u'U"\U0001d552\U0001d553\U0001d554\U0001d555\U0001d556\U0001d557"'
FAIL: /llvm/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp:213
GDB printed:
   u'U"\\x4f60\\x597d"'
Value should match:
   u'U"\u4f60\u597d"'

You can see that Python2 is closer, putting \x instead of \u but I don't know where that points the blame. After some digging I still don't have a good idea what layer would be at fault there and if you could even fix it nicely.

@saugustine If you or others have an idea of where the problem might lie I'm happy to look into it more, rather than disabling them.

I should note that the u16string tests were never enabled but that was just a mistake I think. And that this test cannot run on a gdb <8.2 due to some of the API it uses. So that added to my confusion and maybe the reason I can't find any examples of it passing.

At Linaro we're looking to move all of our bots to Focal and when you move CI you'll hit the same issue.

This looks very much like a locale issue to me. Do you happen to know which one is live in your test bed?

Not sure which parts are relevant but hopefully one of these.

$ locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
libcxx-builder@b200e5cf42aa:/tmp$ locale -c charmap
LC_CTYPE
ANSI_X3.4-1968

There is also the charset that gdb sees:

(gdb) show target-charset
The target character set is "auto; currently ANSI_X3.4-1968".
(gdb) show target-wide-charset
The target wide character set is "auto; currently UTF-32".

Changing my current locale to POSIX from en_US.utf-8 reproduces this problem. I just use the default ubuntu locale normally. I don't have ANSI_X3.4-1968 installed, but I suspect the issue is the same. I expect the tests will pass if you switch LANG=en_US.utf-8--but would be good to check.

Not entirely sure the best thing to do here that will be robust to unexpected locales. We can either update the tests to set the locale, or not run if the locale isn't what we expect. Or try a couple of different matching strings.

The locale tip was spot on.

I've gone with setting the locale just for the test. Luckily we
already have a feature check for that.

DavidSpickett retitled this revision from [libcxx][pretty printers] Skip u16/u32 string printer tests to [libcxx][pretty printers] Correct locale for u16/u32 string tests.Oct 6 2021, 2:33 AM
DavidSpickett edited the summary of this revision. (Show Details)

@ldionne Since the test won't run on bionic for other reasons, there's no immediate reason to update the CI container. If you'd like to get localization tests running on more of them, you could update it now.

Either way, Linaro's bots will move to Focal soon and be running this test.

I overstated the localization tests bit, with this we'll run some more but not all. For that you need fr/ru and zh as well.

Which I can work on adding but for now just en is relevant to the gdb tests.

DavidSpickett updated this revision to Diff 377552.EditedOct 6 2021, 8:08 AM
DavidSpickett retitled this revision from [libcxx][pretty printers] Correct locale for u16/u32 string tests to [libcxx][pretty printers] Correct locale for u16/u32 string tests.

Moving the dockerfile change into a separate diff that installs all the needed locales for the test suite.

https://reviews.llvm.org/D111235

DavidSpickett edited the summary of this revision. (Show Details)Oct 6 2021, 8:08 AM
ldionne accepted this revision as: ldionne.Oct 6 2021, 8:31 AM

FWIW this seems reasonable to me.

I'll let @saugustine give the final go-ahead since its their code!

saugustine accepted this revision.Oct 6 2021, 8:34 AM

Great change. Thanks--especially for fixing the u16_string test.

ldionne accepted this revision.Oct 6 2021, 10:29 AM

Ship it!

This revision is now accepted and ready to land.Oct 6 2021, 10:29 AM
This revision was landed with ongoing or failed builds.Oct 7 2021, 1:26 AM
This revision was automatically updated to reflect the committed changes.

Hi again! I'm actually hitting the same failures locally again now that u16string_test was added:

FAIL: Something is wrong in the test framework.
Converting character sets: Invalid argument.
FAIL: Something is wrong in the test framework.
Converting character sets: Invalid argument.
FAIL: Something is wrong in the test framework.
Converting character sets: Invalid argument.
FAIL: Something is wrong in the test framework.
Converting character sets: Invalid argument.

Is there anything else I can help do/provide to debug this?

I spent some time today trying to reproduce this in as many ways as I could think of, to no avail.

Let's just leave the 16-bit ones disabled for now. I'll work some more on it this weekend and I don't think it should block the very valid other changes here.