New since git am was fucked somehow.
Maybe it's the pre-commit hook I created to run clang-format-diff.py that's fucking everything up?
Paths
| Differential D107202
ConvertUTF: convertUTF32ToUTF8String AbandonedPublic Authored by MarcusJohnson91 on Jul 30 2021, 4:19 PM.
Details
Summary New since git am was fucked somehow. Maybe it's the pre-commit hook I created to run clang-format-diff.py that's fucking everything up?
Diff Detail Event TimelineComment Actions
Comment Actions
What BOM handling? there is no BOM function, bytes are swapped in the converter if the byte order isn't correct, is that what you mean? I copied SrcBytes.size() * UNI_MAX_UTF8_BYTES_PER_CODE_POINT + 1 from the UTF-16 version. Are you asking me to change the UTF-16 version too? Comment Actions
I mean the behavior handling strings that contain UNI_UTF32_BYTE_ORDER_MARK_SWAPPED. I suspect a lot of places don't want the BOM handling to trigger. This includes trying to print diagnostics for wprintf, since the underlying function doesn't have any BOM handling. But I guess it's unlikely to matter in practice.
Comment Actions
I've written my own Unicode encoder/decoder before, I'm familiar with how it works. You can store regular ASCII in a UTF-32 string, like "Example" as UTF-32 would be 7 * 4 = 28 bytes (not counting the null terminator), where as it would just be 7 bytes in UTF-8. and it looks like the std::string is being compacted afterwards with Out.resize(reinterpret_cast<char *>(Dst) - &Out[0]); but maybe a call to Out.shrink_to_fit() at the end is warranted? Comment Actions The the way the math is written now, for "Example", we allocate UNI_MAX_UTF8_BYTES_PER_CODE_POINT * sizeof(UTF32) * 7 = 112 bytes. Comment Actions
Alright, I'm gonna give it a try and re-run the tests
Revision Contents
Diff 387786 llvm/include/llvm/Support/ConvertUTF.h
llvm/lib/Support/ConvertUTFWrapper.cpp
llvm/unittests/Support/ConvertUTFTest.cpp
|