This is an archive of the discontinued LLVM Phabricator instance.

[lit] Prevent hang when lit sees non-ASCII characters
ClosedPublic

Authored by richard.barton.arm on Jun 29 2020, 3:54 AM.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2020, 3:54 AM

Please add a test to lit's test suite, perhaps something like llvm/utils/lit/tests/Inputs/shtest-shell/stdout-encoding.txt.

Thanks for working on this.

llvm/utils/lit/lit/display.py
89–92

Please add a comment documenting the platform where this problem can be seen. Include the python version. If people try to simplify this subtle passage of code later, perhaps to abandon python 2 support, such comments should prove helpful.

Hi @jdenny

Agreed on adding a comment and test - the original patch was a bit information-light, sorry about that.

I am really struggling to figure out how to add a test for this and wonder if you can help/advise.

My environment is python 3.5, ubuntu 16.04 LTS, bash shell with this locale:

LANG=en_US.UTF-8
LANGUAGE=en_US:
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=C

In my environment, the test added under D69207 - shtest-shell/stdout-encoding.txt - hits the UnicodeDecodeError when I run it directly with llvm-lit -a. Setting LC_ALL=en_US.utf-8 avoids the error, obviously. When running the test via make with LIT_OPTS="-a --filter=shtest-shell" make check-lit it does not fail, which is, I assume, why the problem has gone un-detected. I cannot work out how to build a test that runs the 'inner lit' in such an environment where the error would be raised. I can make a test that is adapted from shtest-shell/stdout-encoding.txt with this RUN line

# RUN: not env PYTHONIOENCODING=ascii %{lit} -j 1 -v %{inputs}/shtest-shell-ascii > %t.out

which will trigger the error for me when run with llvm-lit without -a (although the failure mode is to hang, so pretty nasty) but this test passes in make check-all.

I can continue to look further but can you see a trick I am missing? Alternatively, am I on a fool's errand here and a new test will not be possible or practical?

jdenny added a comment.Jul 6 2020, 4:33 PM

Hi. Sorry for the delay.

I can make a test that is adapted from shtest-shell/stdout-encoding.txt with this RUN line

# RUN: not env PYTHONIOENCODING=ascii %{lit} -j 1 -v %{inputs}/shtest-shell-ascii > %t.out

which will trigger the error for me when run with llvm-lit without -a (although the failure mode is to hang, so pretty nasty)

Thanks for finding this reproducer!

but this test passes in make check-all.

By "passes", I think you're saying that it doesn't reproduce the error. Right?

If so, I suspect that llvm-lit called directly uses python2 on your system, but check-all uses python3. Can you confirm?

I also suspect this is a python2-specific bug.

By "passes", I think you're saying that it doesn't reproduce the error. Right?

Yes, sorry. I was trying to write a test that would fail _without_ my patch applied so had got myself turned around.

If so, I suspect that llvm-lit called directly uses python2 on your system, but check-all uses python3. Can you confirm?
I also suspect this is a python2-specific bug.

Spot on - I can confirm this is what is happening. Thanks for clearing that up for me.

I will push an update with a new comment and my new test. I guess the evilness of it when/if it were to regress is ok - make sure we don't regress I suppose!

Add test and comment.

Arg - finger trouble using arc!

I pushed the Flang test that I was writing that made me stumble across this
issue as part of this patch by mistake. Remove it from the new version.

I will push an update with a new comment and my new test.

Thanks!

I guess the evilness of it when/if it were to regress is ok - make sure we don't regress I suppose!

I agree that the possibility of regressions causing hangs is not nice, but I think it's better for them to occur in our test suite immediately than in the wild later.

llvm/utils/lit/lit/display.py
89–92

When I first read this new comment, I thought it might be repeating the info from the earlier comment but at a more specific location in the code. To make it clear this is a different issue, I'd prefer "can raise UnicodeDecodeError" -> "can raise UnicodeDecodeError here too".

Don't forget the . on the last sentence.

llvm/utils/lit/tests/shtest-shell-ascii.py
7 ↗(On Diff #276096)

This test mostly copies a piece of shtest-shell.py but with PYTHONIOENCODING=ascii. In doing so, it separates the original shtest-encoding.txt from the new one even though they're covering related bugs.

I wonder if it would be better to keep everything together and avoid duplication by just extending shtest-shell.py with these new RUN: lines. A comment could explain that shtest-encoding.txt is the main focus but that the other tests are covered by these new RUN: lines too just in case a problem crops up in them.

What do you think?

richard.barton.arm marked 2 inline comments as done.

Updated version taking into account review comments on test and comment.

Actually delete the previous new test (rebasing issues - sorry)

richard.barton.arm marked an inline comment as done.Jul 14 2020, 10:29 AM
richard.barton.arm added inline comments.
llvm/utils/lit/tests/shtest-shell-ascii.py
7 ↗(On Diff #276096)

Agreed. We have to slightly relax the expected output to work for both though as now the shell will skip a character rather than emit one.

jdenny accepted this revision.Jul 14 2020, 10:35 AM
jdenny marked an inline comment as done.

LGTM. Thanks for the changes!

llvm/utils/lit/tests/shtest-shell-ascii.py
7 ↗(On Diff #276096)

Ah, I didn't think of that. But it's probably fine.

This revision is now accepted and ready to land.Jul 14 2020, 10:35 AM
This revision was automatically updated to reflect the committed changes.