This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/test/tools/llvm-ar/
-
test/
-
tools/
-
llvm-ar/
2/2
mri-utf8.test

Differential D68418

[test] Remove locale dependency for mri-utf8.test
ClosedPublic

Authored by thopre on Oct 3 2019, 1:35 PM.

Download Raw Diff

Details

Reviewers

gbreynoo
MaskRay
rupprecht
JamesNagurne
jfb

Commits

rGb455ebf921c5: [test] Remove locale dependency for mri-utf8.test
rL373700: [test] Remove locale dependency for mri-utf8.test

Summary

llvm-ar's mri-utf8.test test relies on the en_US.UTF-8 locale to be
installed for its last RUN line to work. If not installed, the unicode
string gets encoded (interpreted) as ascii which fails since the most
significant byte is non zero. This commit changes the call to open to
use a binary literal of the UTF-8 encoding for the pound sign instead,
thus bypassing the encoding step.

Note that the echo to create the <pound sign>.txt file will work
regardless of the locale because both the shell and the echo (in case
it's not a builtin of the shell concerned) only care about ascii
character to operate. Indeed, the mri-utf8.test file (and in particular
the pound sign) is encoded in UTF-8 and UTF-8 guarantees only ascii
characters can create bytes that can be interpreted as ascii characters
(i.e. bytes with the most significant bit null).

So the process to break down the filename in the line goes something
along:

find an ascii chevron '>'
find beginning of the filename by removing ascii space-like characters
find ascii newline character indicating the end of the redirection (no semicolon ';', closing curly bracket '}' or parenthesis ')' or the like
create a file whose name is made of all the bytes in between beginning and end of filename *without interpretting them*

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 38988
Build 38987: arc lint + arc unit

Event Timeline

thopre created this revision.Oct 3 2019, 1:35 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 3 2019, 1:35 PM

Harbormaster completed remote builds in B38961: Diff 223081.Oct 3 2019, 1:36 PM

thopre added a reviewer: jfb.Oct 3 2019, 1:37 PM

Herald added a subscriber: dexonsmith. · View Herald TranscriptOct 3 2019, 1:37 PM

Out of curiosity, what environment does this currently fail in?

llvm/test/tools/llvm-ar/mri-utf8.test
14–15	This seems to be mostly describing `env LANG=en_US.UTF-8` which is now removed, so the comment can go too.

This revision is now accepted and ready to land.Oct 3 2019, 5:50 PM

Remove now irrelevant comment

Harbormaster completed remote builds in B38988: Diff 223151.Oct 4 2019, 12:09 AM

thopre marked 2 inline comments as done.Oct 4 2019, 12:11 AM

thopre added inline comments.

llvm/test/tools/llvm-ar/mri-utf8.test
14–15	I'm not sure but I believe Ubuntu docker images for instance don't come with en_US.UTF-8 by default.

Closed by commit rL373700: [test] Remove locale dependency for mri-utf8.test (authored by thopre). · Explain WhyOct 4 2019, 12:11 AM

This revision was automatically updated to reflect the committed changes.

thopre marked an inline comment as done.

You are correct that the locale is required to pass on linux. I had some trouble with this test as the behaviour of python in this area differs between linux / windows and python 2 / python 3. For example this fix appears to be fine for linux, however Windows with python 2 fails:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
IOError: [Errno 2] No such file or directory: '\xc2\xa3.txt'

I do not like the reliance on the locale however the upstream buildbots all appear to have it installed. Maybe the test should be split into a windows test and a linux test?

I reverted this change due to the clang-x64-windows-msvc build bot failure.

In D68418#1694439, @gbreynoo wrote:
You are correct that the locale is required to pass on linux. I had some trouble with this test as the behaviour of python in this area differs between linux / windows and python 2 / python 3. For example this fix appears to be fine for linux, however Windows with python 2 fails:
Traceback (most recent call last):
  File "<string>", line 1, in <module>
IOError: [Errno 2] No such file or directory: '\xc2\xa3.txt'
I do not like the reliance on the locale however the upstream buildbots all appear to have it installed. Maybe the test should be split into a windows test and a linux test?

Mmmh, I need a Windows system to try more then. I guess the current code will output the pound sign into whatever encoding Windows use (which I guess is not UTF-8 but then how does the echo few lines above creates the file correct). Thanks for reverting the commit and sorry for the breakage.

In D68418#1694539, @thopre wrote:
In D68418#1694439, @gbreynoo wrote:
You are correct that the locale is required to pass on linux. I had some trouble with this test as the behaviour of python in this area differs between linux / windows and python 2 / python 3. For example this fix appears to be fine for linux, however Windows with python 2 fails:
Traceback (most recent call last):
  File "<string>", line 1, in <module>
IOError: [Errno 2] No such file or directory: '\xc2\xa3.txt'
I do not like the reliance on the locale however the upstream buildbots all appear to have it installed. Maybe the test should be split into a windows test and a linux test?
Mmmh, I need a Windows system to try more then. I guess the current code will output the pound sign into whatever encoding Windows use (which I guess is not UTF-8 but then how does the echo few lines above creates the file correct). Thanks for reverting the commit and sorry for the breakage.

So it's lit that processes the redirection and I guess the file will be read as a UTF-8, decoded into unicode and then on Windows this will output a filename in UTF-16.

Does llvm-ar espects UTF-8 specifically in archive members or can it be anything (e.g. UTF-16)?

In D68418#1694611, @thopre wrote:
In D68418#1694539, @thopre wrote:
In D68418#1694439, @gbreynoo wrote:
You are correct that the locale is required to pass on linux. I had some trouble with this test as the behaviour of python in this area differs between linux / windows and python 2 / python 3. For example this fix appears to be fine for linux, however Windows with python 2 fails:
Traceback (most recent call last):
  File "<string>", line 1, in <module>
IOError: [Errno 2] No such file or directory: '\xc2\xa3.txt'
I do not like the reliance on the locale however the upstream buildbots all appear to have it installed. Maybe the test should be split into a windows test and a linux test?
Mmmh, I need a Windows system to try more then. I guess the current code will output the pound sign into whatever encoding Windows use (which I guess is not UTF-8 but then how does the echo few lines above creates the file correct). Thanks for reverting the commit and sorry for the breakage.
So it's lit that processes the redirection and I guess the file will be read as a UTF-8, decoded into unicode and then on Windows this will output a filename in UTF-16.

Does llvm-ar espects UTF-8 specifically in archive members or can it be anything (e.g. UTF-16)?

Could you try this change on Windows instead:

RUN: env LANG=C.UTF-8 %python -c "assert open(u'\xA3.txt', 'rb').read() == b'contents\n'"

Revision Contents

Path

Size

llvm/

test/

tools/

llvm-ar/

mri-utf8.test

6 lines

Diff 223151

llvm/test/tools/llvm-ar/mri-utf8.test

	# Test non-ascii archive members			# Test non-ascii archive members
	# XFAIL: system-darwin			# XFAIL: system-darwin

	RUN: rm -rf %t && mkdir -p %t/extracted			RUN: rm -rf %t && mkdir -p %t/extracted

	RUN: echo "contents" > %t/£.txt			RUN: echo "contents" > %t/£.txt

	RUN: echo "CREATE %t/mri.ar" > %t/script.mri			RUN: echo "CREATE %t/mri.ar" > %t/script.mri
	RUN: echo "ADDMOD %t/£.txt" >> %t/script.mri			RUN: echo "ADDMOD %t/£.txt" >> %t/script.mri
	RUN: echo "SAVE" >> %t/script.mri			RUN: echo "SAVE" >> %t/script.mri

	RUN: llvm-ar -M < %t/script.mri			RUN: llvm-ar -M < %t/script.mri
	RUN: cd %t/extracted && llvm-ar x %t/mri.ar			RUN: cd %t/extracted && llvm-ar x %t/mri.ar

	# This works around problems launching processess that			RUN: %python -c "assert open(b'\xC2\xA3.txt', 'rb').read() == b'contents\n'"
				rupprechtUnsubmitted Done Reply Inline Actions This seems to be mostly describing `env LANG=en_US.UTF-8` which is now removed, so the comment can go too. rupprecht: This seems to be mostly describing `env LANG=en_US.UTF-8` which is now removed, so the comment…
				thopreAuthorUnsubmitted Done Reply Inline Actions I'm not sure but I believe Ubuntu docker images for instance don't come with en_US.UTF-8 by default. thopre: I'm not sure but I believe Ubuntu docker images for instance don't come with en_US.UTF-8 by…
	# include arguments with non-ascii characters.
	# Python on Linux defaults to ASCII encoding unless the
	# environment specifies otherwise, so it is explicitly set.
	RUN: env LANG=en_US.UTF-8 %python -c "assert open(u'\U000000A3.txt', 'rb').read() == b'contents\n'"