This is an archive of the discontinued LLVM Phabricator instance.

Remove conditional compilation for WCHAR support in libedit
ClosedPublic

Authored by nealsid on Jul 14 2021, 8:52 PM.

Details

Summary

This change moves to using narrow character types and libedit APIs in Editline, because those are the same types that the rest of LLVM/LLDB uses, and it's generally considered better practice to use UTF-8 encoded in char than it is to use wider characters. However, for character input, the change leaves in using a wchar to enable input of multi-byte characters.

Diff Detail

Event Timeline

nealsid created this revision.Jul 14 2021, 8:52 PM
nealsid requested review of this revision.Jul 14 2021, 8:52 PM
nealsid edited the summary of this revision. (Show Details)Jul 14 2021, 8:53 PM

I actually expected after the RFC that we would remove all the non-wchar code, but this seems also fine. I think this LGTM in general, but I feel a bit nervous about touching stuff that depends so much on OS/environment. What OS/environment did you test this patch on? Would be nice to have this tested on a few setups before landing.

Also I'm kinda curious if you found any docs/examples that explain whether mixing the wchar/char functions like we do now is actually supported in libedit? IIUC we call now el_wset and el_set on the same libedit instance. It feels like the wchar support in the FreeBSD port was some kind of parallel implementation to the normal char support, so I'm surprised that we can just mix those functions and everything still works fine (at least on my Linux machine this seems to work).

lldb/source/Host/common/Editline.cpp
552

The original code was actually LLVM-code style (no {} around single line ifs)

I actually expected after the RFC that we would remove all the non-wchar code, but this seems also fine. I think this LGTM in general, but I feel a bit nervous about touching stuff that depends so much on OS/environment. What OS/environment did you test this patch on? Would be nice to have this tested on a few setups before landing.

This was my mistake, sorry. I originally went the route in this patch and ran into some errors testing, so I switched to what I detailed in the RFC. But then I found the problem (I was using narrow chars for the GetCharacter callback when that actually isn't supported). Overall I think it is best to use narrow char and string rather than wide-char and wstring because that's more consistent with the rest of LLVM.

Regarding platforms, I tested on OS X Big Sur and Monterey, and Debian Linux inside a VM. Jan was able to build on Fedora. I'm happy to test on more platforms - FreeBSD, NetBSD perhaps?

Also I'm kinda curious if you found any docs/examples that explain whether mixing the wchar/char functions like we do now is actually supported in libedit? IIUC we call now el_wset and el_set on the same libedit instance. It feels like the wchar support in the FreeBSD port was some kind of parallel implementation to the normal char support, so I'm surprised that we can just mix those functions and everything still works fine (at least on my Linux machine this seems to work).

Yeah, the original source converts the parameters and calls el_w* functions when the narrow-char functions are called. This is also true on FreeBSD: https://github.com/freebsd/freebsd-src/blob/373ffc62c158e52cde86a5b934ab4a51307f9f2e/contrib/libedit/eln.c#L359

nealsid updated this revision to Diff 359533.Jul 16 2021, 11:16 PM

Reran clang-format and removed extra braces.

nealsid marked an inline comment as done.Jul 16 2021, 11:17 PM
teemperor accepted this revision.Jul 17 2021, 3:31 AM

I actually expected after the RFC that we would remove all the non-wchar code, but this seems also fine. I think this LGTM in general, but I feel a bit nervous about touching stuff that depends so much on OS/environment. What OS/environment did you test this patch on? Would be nice to have this tested on a few setups before landing.

This was my mistake, sorry. I originally went the route in this patch and ran into some errors testing, so I switched to what I detailed in the RFC. But then I found the problem (I was using narrow chars for the GetCharacter callback when that actually isn't supported). Overall I think it is best to use narrow char and string rather than wide-char and wstring because that's more consistent with the rest of LLVM.

I actually like this approach even better, so I'm glad this turned out the way it did!

Regarding platforms, I tested on OS X Big Sur and Monterey, and Debian Linux inside a VM. Jan was able to build on Fedora. I'm happy to test on more platforms - FreeBSD, NetBSD perhaps?

I think this is more than enough to test on before landing so this LGTM. FWIW, we do have a release branching in about 10 days or so, so you might want to wait with landing this just directly after that branch was made so that this spends a bit more time on ToT where we can find issues before going into a release. (I assume you don't care whether this makes it into the current release or not, but please correct me if i'm wrong).

Also I'm kinda curious if you found any docs/examples that explain whether mixing the wchar/char functions like we do now is actually supported in libedit? IIUC we call now el_wset and el_set on the same libedit instance. It feels like the wchar support in the FreeBSD port was some kind of parallel implementation to the normal char support, so I'm surprised that we can just mix those functions and everything still works fine (at least on my Linux machine this seems to work).

Yeah, the original source converts the parameters and calls el_w* functions when the narrow-char functions are called. This is also true on FreeBSD: https://github.com/freebsd/freebsd-src/blob/373ffc62c158e52cde86a5b934ab4a51307f9f2e/contrib/libedit/eln.c#L359

Cool, thanks for finding that out!

Anyway, I think this LGTM. Thanks for doing it! Also I can't recall if you have commit access, so please let me know if I should land this.

This revision is now accepted and ready to land.Jul 17 2021, 3:31 AM

I actually expected after the RFC that we would remove all the non-wchar code, but this seems also fine. I think this LGTM in general, but I feel a bit nervous about touching stuff that depends so much on OS/environment. What OS/environment did you test this patch on? Would be nice to have this tested on a few setups before landing.

This was my mistake, sorry. I originally went the route in this patch and ran into some errors testing, so I switched to what I detailed in the RFC. But then I found the problem (I was using narrow chars for the GetCharacter callback when that actually isn't supported). Overall I think it is best to use narrow char and string rather than wide-char and wstring because that's more consistent with the rest of LLVM.

I actually like this approach even better, so I'm glad this turned out the way it did!

Regarding platforms, I tested on OS X Big Sur and Monterey, and Debian Linux inside a VM. Jan was able to build on Fedora. I'm happy to test on more platforms - FreeBSD, NetBSD perhaps?

I think this is more than enough to test on before landing so this LGTM. FWIW, we do have a release branching in about 10 days or so, so you might want to wait with landing this just directly after that branch was made so that this spends a bit more time on ToT where we can find issues before going into a release. (I assume you don't care whether this makes it into the current release or not, but please correct me if i'm wrong).

Definitely +1 on baking it at HEAD for a bit - SGTM.

Also I'm kinda curious if you found any docs/examples that explain whether mixing the wchar/char functions like we do now is actually supported in libedit? IIUC we call now el_wset and el_set on the same libedit instance. It feels like the wchar support in the FreeBSD port was some kind of parallel implementation to the normal char support, so I'm surprised that we can just mix those functions and everything still works fine (at least on my Linux machine this seems to work).

Yeah, the original source converts the parameters and calls el_w* functions when the narrow-char functions are called. This is also true on FreeBSD: https://github.com/freebsd/freebsd-src/blob/373ffc62c158e52cde86a5b934ab4a51307f9f2e/contrib/libedit/eln.c#L359

Cool, thanks for finding that out!

Anyway, I think this LGTM. Thanks for doing it! Also I can't recall if you have commit access, so please let me know if I should land this.

I don't, but maybe I can start that conversation around commit access. For this patch, I'll watch for the next release branch cut and ask you about landing it then. Thanks!

Neal

nealsid updated this revision to Diff 363358.Aug 1 2021, 4:08 PM

Update against HEAD (I still need to do a bit more testing but wanted to get the buildbot results in the meantime)

nealsid updated this revision to Diff 364362.Aug 4 2021, 11:42 PM

Update against HEAD; tested on Debian Buster and Monterey

The release branch was recently made, so we can land this right now. Just give me a ping when I should merge this :)

The release branch was recently made, so we can land this right now. Just give me a ping when I should merge this :)

Thanks - I recently got commit access so I figured I'd try it out with this one, and I *think* it worked.

The release branch was recently made, so we can land this right now. Just give me a ping when I should merge this :)

Thanks - I recently got commit access so I figured I'd try it out with this one, and I *think* it worked.

Jepp, worked and congrats!

I fear I might also have to congratulate you on your first revert (sorry, couldn't resist that joke). TestUnicode is failing for me with this patch on Arch Linux:

Command Output (stdout):
--
lldb version 14.0.0 (git@github.com:Teemperor/llvm-project revision f9f63e92243ecd6d0df5d6813967a49c8ca79ca5)
  clang revision f9f63e92243ecd6d0df5d6813967a49c8ca79ca5
  llvm revision f9f63e92243ecd6d0df5d6813967a49c8ca79ca5
Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
FAIL: LLDB (/home/teemperor/work/ci/build/bin/clang-x86_64) :: test_unicode_input (TestUnicode.TestCase)
======================================================================
ERROR: test_unicode_input (TestUnicode.TestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/expect.py", line 111, in expect_loop
    incoming = spawn.read_nonblocking(spawn.maxread, timeout)
  File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/pty_spawn.py", line 482, in read_nonblocking
    raise TIMEOUT('Timeout exceeded.')
pexpect.exceptions.TIMEOUT: Timeout exceeded.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/teemperor/work/ci/llvm-project/lldb/packages/Python/lldbsuite/test/decorators.py", line 149, in wrapper
    return func(*args, **kwargs)
  File "/home/teemperor/work/ci/llvm-project/lldb/test/API/iohandler/unicode/TestUnicode.py", line 25, in test_unicode_input
    self.expect(u'\u1234', substrs=[u"error: '\u1234' is not a valid command.".encode('utf-8')])
  File "/home/teemperor/work/ci/llvm-project/lldb/packages/Python/lldbsuite/test/lldbpexpect.py", line 64, in expect
    self.child.expect_exact(s)
  File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/spawnbase.py", line 418, in expect_exact
    return exp.expect_loop(timeout)
  File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/expect.py", line 119, in expect_loop
    return self.timeout(e)
  File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/expect.py", line 82, in timeout
    raise TIMEOUT(msg)
pexpect.exceptions.TIMEOUT: Timeout exceeded.
<pexpect.pty_spawn.spawn object at 0x7f004e20a610>
command: /home/teemperor/work/ci/build/bin/lldb
args: ['/home/teemperor/work/ci/build/bin/lldb', '--no-lldbinit', '--no-use-colors', '-O', 'settings clear -all', '-O', 'settings set symbols.enable-external-lookup false', '-O', 'settings set target.inherit-tcc true', '-O', 'settings set target.detach-on-error false', '-O', 'settings set target.auto-apply-fixits false', '-O', 'settings set plugin.process.gdb-remote.packet-timeout 60', '-O', 'settings set symbols.clang-modules-cache-path "/home/teemperor/work/ci/build/lldb-test-build.noindex/module-cache-lldb/lldb-api"', '-O', 'settings set use-color false', '-O', 'setting set target.prefer-dynamic-value no-dynamic-values']
buffer (last 100 chars): b'\\U+1234\r\n(lldb) '
before (last 100 chars): b'\\U+1234\r\n(lldb) '
after: <class 'pexpect.exceptions.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 609226
child_fd: 6
closed: False
timeout: 60
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_string:
    0: b"error: '\xe1\x88\xb4' is not a valid command."
Config=x86_64-/home/teemperor/work/ci/build/bin/clang
----------------------------------------------------------------------
Ran 1 test in 60.551s

:) Shoot. I definitely want to revert ASAP rather than keep the tree in a bad state. Just curious, since there are timeout messages in the log, would you be able to try rerunning just that one? I see this, too: 0: b"error: '\xe1\x88\xb4' is not a valid command. So I'm guessing the timeout is a red herring.

I'll create an Arch VM to test on for future patches. Thank you and sorry about that.

I fear I might also have to congratulate you on your first revert (sorry, couldn't resist that joke). TestUnicode is failing for me with this patch on Arch Linux:

Command Output (stdout):
--
lldb version 14.0.0 (git@github.com:Teemperor/llvm-project revision f9f63e92243ecd6d0df5d6813967a49c8ca79ca5)
  clang revision f9f63e92243ecd6d0df5d6813967a49c8ca79ca5
  llvm revision f9f63e92243ecd6d0df5d6813967a49c8ca79ca5
Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
FAIL: LLDB (/home/teemperor/work/ci/build/bin/clang-x86_64) :: test_unicode_input (TestUnicode.TestCase)
======================================================================
ERROR: test_unicode_input (TestUnicode.TestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/expect.py", line 111, in expect_loop
    incoming = spawn.read_nonblocking(spawn.maxread, timeout)
  File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/pty_spawn.py", line 482, in read_nonblocking
    raise TIMEOUT('Timeout exceeded.')
pexpect.exceptions.TIMEOUT: Timeout exceeded.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/teemperor/work/ci/llvm-project/lldb/packages/Python/lldbsuite/test/decorators.py", line 149, in wrapper
    return func(*args, **kwargs)
  File "/home/teemperor/work/ci/llvm-project/lldb/test/API/iohandler/unicode/TestUnicode.py", line 25, in test_unicode_input
    self.expect(u'\u1234', substrs=[u"error: '\u1234' is not a valid command.".encode('utf-8')])
  File "/home/teemperor/work/ci/llvm-project/lldb/packages/Python/lldbsuite/test/lldbpexpect.py", line 64, in expect
    self.child.expect_exact(s)
  File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/spawnbase.py", line 418, in expect_exact
    return exp.expect_loop(timeout)
  File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/expect.py", line 119, in expect_loop
    return self.timeout(e)
  File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/expect.py", line 82, in timeout
    raise TIMEOUT(msg)
pexpect.exceptions.TIMEOUT: Timeout exceeded.
<pexpect.pty_spawn.spawn object at 0x7f004e20a610>
command: /home/teemperor/work/ci/build/bin/lldb
args: ['/home/teemperor/work/ci/build/bin/lldb', '--no-lldbinit', '--no-use-colors', '-O', 'settings clear -all', '-O', 'settings set symbols.enable-external-lookup false', '-O', 'settings set target.inherit-tcc true', '-O', 'settings set target.detach-on-error false', '-O', 'settings set target.auto-apply-fixits false', '-O', 'settings set plugin.process.gdb-remote.packet-timeout 60', '-O', 'settings set symbols.clang-modules-cache-path "/home/teemperor/work/ci/build/lldb-test-build.noindex/module-cache-lldb/lldb-api"', '-O', 'settings set use-color false', '-O', 'setting set target.prefer-dynamic-value no-dynamic-values']
buffer (last 100 chars): b'\\U+1234\r\n(lldb) '
before (last 100 chars): b'\\U+1234\r\n(lldb) '
after: <class 'pexpect.exceptions.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 609226
child_fd: 6
closed: False
timeout: 60
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_string:
    0: b"error: '\xe1\x88\xb4' is not a valid command."
Config=x86_64-/home/teemperor/work/ci/build/bin/clang
----------------------------------------------------------------------
Ran 1 test in 60.551s

I ran it a few times and I'm sure it's an actual '"the output didn't show up" timeout and not a random failure. I also checked and my node doesn't have an .editrc file or something like that on the node. It's failing on a different Arch Linux bot so I think it's not a borked machine.

Arch should use just the vanilla libedit so I am not sure why it would be different from other distros. IIRC I ran all the tests when I first reviewed this, so I am also a bit confused why this is now failing. When I manually input unicode characters into the old and new prompt both seem to work, so maybe it's the pexpect test setup that is somehow dropping the input?

On a side note: I think this patch actually breaks the way backspace works. With this patch applied, if I enter a command such as '∂∂∂∂∂∂∂∂∂' and then press backspace, the cursor is now moved to the right by (what I assume) is the amount of bytes that I entered. I think there is some strlen logic going on that isn't unicode aware. See below:

:) Shoot. I definitely want to revert ASAP rather than keep the tree in a bad state. Just curious, since there are timeout messages in the log, would you be able to try rerunning just that one? I see this, too: 0: b"error: '\xe1\x88\xb4' is not a valid command. So I'm guessing the timeout is a red herring.

I'll create an Arch VM to test on for future patches. Thank you and sorry about that.

I fear I might also have to congratulate you on your first revert (sorry, couldn't resist that joke). TestUnicode is failing for me with this patch on Arch Linux:

Command Output (stdout):
--
lldb version 14.0.0 (git@github.com:Teemperor/llvm-project revision f9f63e92243ecd6d0df5d6813967a49c8ca79ca5)
  clang revision f9f63e92243ecd6d0df5d6813967a49c8ca79ca5
  llvm revision f9f63e92243ecd6d0df5d6813967a49c8ca79ca5
Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
FAIL: LLDB (/home/teemperor/work/ci/build/bin/clang-x86_64) :: test_unicode_input (TestUnicode.TestCase)
======================================================================
ERROR: test_unicode_input (TestUnicode.TestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/expect.py", line 111, in expect_loop
    incoming = spawn.read_nonblocking(spawn.maxread, timeout)
  File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/pty_spawn.py", line 482, in read_nonblocking
    raise TIMEOUT('Timeout exceeded.')
pexpect.exceptions.TIMEOUT: Timeout exceeded.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/teemperor/work/ci/llvm-project/lldb/packages/Python/lldbsuite/test/decorators.py", line 149, in wrapper
    return func(*args, **kwargs)
  File "/home/teemperor/work/ci/llvm-project/lldb/test/API/iohandler/unicode/TestUnicode.py", line 25, in test_unicode_input
    self.expect(u'\u1234', substrs=[u"error: '\u1234' is not a valid command.".encode('utf-8')])
  File "/home/teemperor/work/ci/llvm-project/lldb/packages/Python/lldbsuite/test/lldbpexpect.py", line 64, in expect
    self.child.expect_exact(s)
  File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/spawnbase.py", line 418, in expect_exact
    return exp.expect_loop(timeout)
  File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/expect.py", line 119, in expect_loop
    return self.timeout(e)
  File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/expect.py", line 82, in timeout
    raise TIMEOUT(msg)
pexpect.exceptions.TIMEOUT: Timeout exceeded.
<pexpect.pty_spawn.spawn object at 0x7f004e20a610>
command: /home/teemperor/work/ci/build/bin/lldb
args: ['/home/teemperor/work/ci/build/bin/lldb', '--no-lldbinit', '--no-use-colors', '-O', 'settings clear -all', '-O', 'settings set symbols.enable-external-lookup false', '-O', 'settings set target.inherit-tcc true', '-O', 'settings set target.detach-on-error false', '-O', 'settings set target.auto-apply-fixits false', '-O', 'settings set plugin.process.gdb-remote.packet-timeout 60', '-O', 'settings set symbols.clang-modules-cache-path "/home/teemperor/work/ci/build/lldb-test-build.noindex/module-cache-lldb/lldb-api"', '-O', 'settings set use-color false', '-O', 'setting set target.prefer-dynamic-value no-dynamic-values']
buffer (last 100 chars): b'\\U+1234\r\n(lldb) '
before (last 100 chars): b'\\U+1234\r\n(lldb) '
after: <class 'pexpect.exceptions.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 609226
child_fd: 6
closed: False
timeout: 60
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_string:
    0: b"error: '\xe1\x88\xb4' is not a valid command."
Config=x86_64-/home/teemperor/work/ci/build/bin/clang
----------------------------------------------------------------------
Ran 1 test in 60.551s

Yikes - issue count starting to slowly increase :) I've reverted the patch, and sorry again about that.

I have to travel for a couple days but when I get my dev machine back up later this week I will investigate these issues in detail. I focussed my testing more on Unicode identifiers inside source files/debug information and missed the backspace problem, unfortunately. Thanks.

I ran it a few times and I'm sure it's an actual '"the output didn't show up" timeout and not a random failure. I also checked and my node doesn't have an .editrc file or something like that on the node. It's failing on a different Arch Linux bot so I think it's not a borked machine.

Arch should use just the vanilla libedit so I am not sure why it would be different from other distros. IIRC I ran all the tests when I first reviewed this, so I am also a bit confused why this is now failing. When I manually input unicode characters into the old and new prompt both seem to work, so maybe it's the pexpect test setup that is somehow dropping the input?

On a side note: I think this patch actually breaks the way backspace works. With this patch applied, if I enter a command such as '∂∂∂∂∂∂∂∂∂' and then press backspace, the cursor is now moved to the right by (what I assume) is the amount of bytes that I entered. I think there is some strlen logic going on that isn't unicode aware. See below:

:) Shoot. I definitely want to revert ASAP rather than keep the tree in a bad state. Just curious, since there are timeout messages in the log, would you be able to try rerunning just that one? I see this, too: 0: b"error: '\xe1\x88\xb4' is not a valid command. So I'm guessing the timeout is a red herring.

I'll create an Arch VM to test on for future patches. Thank you and sorry about that.

I fear I might also have to congratulate you on your first revert (sorry, couldn't resist that joke). TestUnicode is failing for me with this patch on Arch Linux:

Command Output (stdout):
--
lldb version 14.0.0 (git@github.com:Teemperor/llvm-project revision f9f63e92243ecd6d0df5d6813967a49c8ca79ca5)
  clang revision f9f63e92243ecd6d0df5d6813967a49c8ca79ca5
  llvm revision f9f63e92243ecd6d0df5d6813967a49c8ca79ca5
Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
FAIL: LLDB (/home/teemperor/work/ci/build/bin/clang-x86_64) :: test_unicode_input (TestUnicode.TestCase)
======================================================================
ERROR: test_unicode_input (TestUnicode.TestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/expect.py", line 111, in expect_loop
    incoming = spawn.read_nonblocking(spawn.maxread, timeout)
  File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/pty_spawn.py", line 482, in read_nonblocking
    raise TIMEOUT('Timeout exceeded.')
pexpect.exceptions.TIMEOUT: Timeout exceeded.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/teemperor/work/ci/llvm-project/lldb/packages/Python/lldbsuite/test/decorators.py", line 149, in wrapper
    return func(*args, **kwargs)
  File "/home/teemperor/work/ci/llvm-project/lldb/test/API/iohandler/unicode/TestUnicode.py", line 25, in test_unicode_input
    self.expect(u'\u1234', substrs=[u"error: '\u1234' is not a valid command.".encode('utf-8')])
  File "/home/teemperor/work/ci/llvm-project/lldb/packages/Python/lldbsuite/test/lldbpexpect.py", line 64, in expect
    self.child.expect_exact(s)
  File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/spawnbase.py", line 418, in expect_exact
    return exp.expect_loop(timeout)
  File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/expect.py", line 119, in expect_loop
    return self.timeout(e)
  File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/expect.py", line 82, in timeout
    raise TIMEOUT(msg)
pexpect.exceptions.TIMEOUT: Timeout exceeded.
<pexpect.pty_spawn.spawn object at 0x7f004e20a610>
command: /home/teemperor/work/ci/build/bin/lldb
args: ['/home/teemperor/work/ci/build/bin/lldb', '--no-lldbinit', '--no-use-colors', '-O', 'settings clear -all', '-O', 'settings set symbols.enable-external-lookup false', '-O', 'settings set target.inherit-tcc true', '-O', 'settings set target.detach-on-error false', '-O', 'settings set target.auto-apply-fixits false', '-O', 'settings set plugin.process.gdb-remote.packet-timeout 60', '-O', 'settings set symbols.clang-modules-cache-path "/home/teemperor/work/ci/build/lldb-test-build.noindex/module-cache-lldb/lldb-api"', '-O', 'settings set use-color false', '-O', 'setting set target.prefer-dynamic-value no-dynamic-values']
buffer (last 100 chars): b'\\U+1234\r\n(lldb) '
before (last 100 chars): b'\\U+1234\r\n(lldb) '
after: <class 'pexpect.exceptions.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 609226
child_fd: 6
closed: False
timeout: 60
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_string:
    0: b"error: '\xe1\x88\xb4' is not a valid command."
Config=x86_64-/home/teemperor/work/ci/build/bin/clang
----------------------------------------------------------------------
Ran 1 test in 60.551s

Sorry, I haven't forgotten about this. Haven't had time to sit down for a proper debugging session this or last week. I definitely still plan to update and resend for review. Thanks!