All east asian width wide and full-width codepoints
are considered double width, as well as emojis and
symbols commonely rendered as emoji.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
60,050 ms | x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test |
Event Timeline
I did that as a drive-by while investingating https://github.com/llvm/llvm-project/issues/54732#issuecomment-1324107610 (which turned out to work correctly after all)
There are some preexisting tests in llvm/unittests/Support/UnicodeTest.cpp which I might be able to extend with a sampling of unicode 15 codepoints, I don't know how meaningful that would be but as we talked before the only way to do exhaustive checking here would be to cross check to independent implementation.
LGTM! Agreed that testing this would be somewhat meaningless without some sort of oracle we can reference. Probably should add a release note for the fix when landing?
The release notes have
- Unicode support has been updated to support Unicode 15.0.
New unicode codepoints are supported as appropriate in diagnostics,
C and C++ identifiers, and escape sequences.
That seems sufficient
The tests broke the build on some windows platforms, I pushed a fix here https://reviews.llvm.org/rG9fec67483d4c
Sorry for anyone who was impacted by that