This change moves to using narrow character types and libedit APIs in Editline, because those are the same types that the rest of LLVM/LLDB uses, and it's generally considered better practice to use UTF-8 encoded in char than it is to use wider characters. However, for character input, the change leaves in using a wchar to enable input of multi-byte characters.
Details
- Reviewers
teemperor - Group Reviewers
Restricted Project - Commits
- rG7529f0e3e142: D106035: Remove conditional compilation for WCHAR support in libedit
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I actually expected after the RFC that we would remove all the non-wchar code, but this seems also fine. I think this LGTM in general, but I feel a bit nervous about touching stuff that depends so much on OS/environment. What OS/environment did you test this patch on? Would be nice to have this tested on a few setups before landing.
Also I'm kinda curious if you found any docs/examples that explain whether mixing the wchar/char functions like we do now is actually supported in libedit? IIUC we call now el_wset and el_set on the same libedit instance. It feels like the wchar support in the FreeBSD port was some kind of parallel implementation to the normal char support, so I'm surprised that we can just mix those functions and everything still works fine (at least on my Linux machine this seems to work).
lldb/source/Host/common/Editline.cpp | ||
---|---|---|
552 | The original code was actually LLVM-code style (no {} around single line ifs) |
This was my mistake, sorry. I originally went the route in this patch and ran into some errors testing, so I switched to what I detailed in the RFC. But then I found the problem (I was using narrow chars for the GetCharacter callback when that actually isn't supported). Overall I think it is best to use narrow char and string rather than wide-char and wstring because that's more consistent with the rest of LLVM.
Regarding platforms, I tested on OS X Big Sur and Monterey, and Debian Linux inside a VM. Jan was able to build on Fedora. I'm happy to test on more platforms - FreeBSD, NetBSD perhaps?
Also I'm kinda curious if you found any docs/examples that explain whether mixing the wchar/char functions like we do now is actually supported in libedit? IIUC we call now el_wset and el_set on the same libedit instance. It feels like the wchar support in the FreeBSD port was some kind of parallel implementation to the normal char support, so I'm surprised that we can just mix those functions and everything still works fine (at least on my Linux machine this seems to work).
Yeah, the original source converts the parameters and calls el_w* functions when the narrow-char functions are called. This is also true on FreeBSD: https://github.com/freebsd/freebsd-src/blob/373ffc62c158e52cde86a5b934ab4a51307f9f2e/contrib/libedit/eln.c#L359
I actually like this approach even better, so I'm glad this turned out the way it did!
Regarding platforms, I tested on OS X Big Sur and Monterey, and Debian Linux inside a VM. Jan was able to build on Fedora. I'm happy to test on more platforms - FreeBSD, NetBSD perhaps?
I think this is more than enough to test on before landing so this LGTM. FWIW, we do have a release branching in about 10 days or so, so you might want to wait with landing this just directly after that branch was made so that this spends a bit more time on ToT where we can find issues before going into a release. (I assume you don't care whether this makes it into the current release or not, but please correct me if i'm wrong).
Also I'm kinda curious if you found any docs/examples that explain whether mixing the wchar/char functions like we do now is actually supported in libedit? IIUC we call now el_wset and el_set on the same libedit instance. It feels like the wchar support in the FreeBSD port was some kind of parallel implementation to the normal char support, so I'm surprised that we can just mix those functions and everything still works fine (at least on my Linux machine this seems to work).
Yeah, the original source converts the parameters and calls el_w* functions when the narrow-char functions are called. This is also true on FreeBSD: https://github.com/freebsd/freebsd-src/blob/373ffc62c158e52cde86a5b934ab4a51307f9f2e/contrib/libedit/eln.c#L359
Cool, thanks for finding that out!
Anyway, I think this LGTM. Thanks for doing it! Also I can't recall if you have commit access, so please let me know if I should land this.
Definitely +1 on baking it at HEAD for a bit - SGTM.
Also I'm kinda curious if you found any docs/examples that explain whether mixing the wchar/char functions like we do now is actually supported in libedit? IIUC we call now el_wset and el_set on the same libedit instance. It feels like the wchar support in the FreeBSD port was some kind of parallel implementation to the normal char support, so I'm surprised that we can just mix those functions and everything still works fine (at least on my Linux machine this seems to work).
Yeah, the original source converts the parameters and calls el_w* functions when the narrow-char functions are called. This is also true on FreeBSD: https://github.com/freebsd/freebsd-src/blob/373ffc62c158e52cde86a5b934ab4a51307f9f2e/contrib/libedit/eln.c#L359
Cool, thanks for finding that out!
Anyway, I think this LGTM. Thanks for doing it! Also I can't recall if you have commit access, so please let me know if I should land this.
I don't, but maybe I can start that conversation around commit access. For this patch, I'll watch for the next release branch cut and ask you about landing it then. Thanks!
Neal
Update against HEAD (I still need to do a bit more testing but wanted to get the buildbot results in the meantime)
The release branch was recently made, so we can land this right now. Just give me a ping when I should merge this :)
Thanks - I recently got commit access so I figured I'd try it out with this one, and I *think* it worked.
I fear I might also have to congratulate you on your first revert (sorry, couldn't resist that joke). TestUnicode is failing for me with this patch on Arch Linux:
Command Output (stdout): -- lldb version 14.0.0 (git@github.com:Teemperor/llvm-project revision f9f63e92243ecd6d0df5d6813967a49c8ca79ca5) clang revision f9f63e92243ecd6d0df5d6813967a49c8ca79ca5 llvm revision f9f63e92243ecd6d0df5d6813967a49c8ca79ca5 Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc'] -- Command Output (stderr): -- FAIL: LLDB (/home/teemperor/work/ci/build/bin/clang-x86_64) :: test_unicode_input (TestUnicode.TestCase) ====================================================================== ERROR: test_unicode_input (TestUnicode.TestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/expect.py", line 111, in expect_loop incoming = spawn.read_nonblocking(spawn.maxread, timeout) File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/pty_spawn.py", line 482, in read_nonblocking raise TIMEOUT('Timeout exceeded.') pexpect.exceptions.TIMEOUT: Timeout exceeded. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/teemperor/work/ci/llvm-project/lldb/packages/Python/lldbsuite/test/decorators.py", line 149, in wrapper return func(*args, **kwargs) File "/home/teemperor/work/ci/llvm-project/lldb/test/API/iohandler/unicode/TestUnicode.py", line 25, in test_unicode_input self.expect(u'\u1234', substrs=[u"error: '\u1234' is not a valid command.".encode('utf-8')]) File "/home/teemperor/work/ci/llvm-project/lldb/packages/Python/lldbsuite/test/lldbpexpect.py", line 64, in expect self.child.expect_exact(s) File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/spawnbase.py", line 418, in expect_exact return exp.expect_loop(timeout) File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/expect.py", line 119, in expect_loop return self.timeout(e) File "/home/teemperor/work/ci/llvm-project/lldb/third_party/Python/module/pexpect-4.6/pexpect/expect.py", line 82, in timeout raise TIMEOUT(msg) pexpect.exceptions.TIMEOUT: Timeout exceeded. <pexpect.pty_spawn.spawn object at 0x7f004e20a610> command: /home/teemperor/work/ci/build/bin/lldb args: ['/home/teemperor/work/ci/build/bin/lldb', '--no-lldbinit', '--no-use-colors', '-O', 'settings clear -all', '-O', 'settings set symbols.enable-external-lookup false', '-O', 'settings set target.inherit-tcc true', '-O', 'settings set target.detach-on-error false', '-O', 'settings set target.auto-apply-fixits false', '-O', 'settings set plugin.process.gdb-remote.packet-timeout 60', '-O', 'settings set symbols.clang-modules-cache-path "/home/teemperor/work/ci/build/lldb-test-build.noindex/module-cache-lldb/lldb-api"', '-O', 'settings set use-color false', '-O', 'setting set target.prefer-dynamic-value no-dynamic-values'] buffer (last 100 chars): b'\\U+1234\r\n(lldb) ' before (last 100 chars): b'\\U+1234\r\n(lldb) ' after: <class 'pexpect.exceptions.TIMEOUT'> match: None match_index: None exitstatus: None flag_eof: False pid: 609226 child_fd: 6 closed: False timeout: 60 delimiter: <class 'pexpect.exceptions.EOF'> logfile: None logfile_read: None logfile_send: None maxread: 2000 ignorecase: False searchwindowsize: None delaybeforesend: 0.05 delayafterclose: 0.1 delayafterterminate: 0.1 searcher: searcher_string: 0: b"error: '\xe1\x88\xb4' is not a valid command." Config=x86_64-/home/teemperor/work/ci/build/bin/clang ---------------------------------------------------------------------- Ran 1 test in 60.551s
:) Shoot. I definitely want to revert ASAP rather than keep the tree in a bad state. Just curious, since there are timeout messages in the log, would you be able to try rerunning just that one? I see this, too: 0: b"error: '\xe1\x88\xb4' is not a valid command. So I'm guessing the timeout is a red herring.
I'll create an Arch VM to test on for future patches. Thank you and sorry about that.
I ran it a few times and I'm sure it's an actual '"the output didn't show up" timeout and not a random failure. I also checked and my node doesn't have an .editrc file or something like that on the node. It's failing on a different Arch Linux bot so I think it's not a borked machine.
Arch should use just the vanilla libedit so I am not sure why it would be different from other distros. IIRC I ran all the tests when I first reviewed this, so I am also a bit confused why this is now failing. When I manually input unicode characters into the old and new prompt both seem to work, so maybe it's the pexpect test setup that is somehow dropping the input?
On a side note: I think this patch actually breaks the way backspace works. With this patch applied, if I enter a command such as '∂∂∂∂∂∂∂∂∂' and then press backspace, the cursor is now moved to the right by (what I assume) is the amount of bytes that I entered. I think there is some strlen logic going on that isn't unicode aware. See below:
Yikes - issue count starting to slowly increase :) I've reverted the patch, and sorry again about that.
I have to travel for a couple days but when I get my dev machine back up later this week I will investigate these issues in detail. I focussed my testing more on Unicode identifiers inside source files/debug information and missed the backspace problem, unfortunately. Thanks.
Sorry, I haven't forgotten about this. Haven't had time to sit down for a proper debugging session this or last week. I definitely still plan to update and resend for review. Thanks!
The original code was actually LLVM-code style (no {} around single line ifs)