Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
I have face this problem long time ago.
This happens only:
- on non-English Windows (and could be fixed via system settings: Control Panel -> Clock,Language,and Region -> Region and Language -> Administrative ->Language for non-Unicode programs -> Change system locale -> English).
- with Visual Studio Compiler
Seems, there are other places in LLVM with this problem (unsure that this is a complete list):
https://github.com/llvm/llvm-project/blob/master/clang-tools-extra/clangd/unittests/SourceCodeTests.cpp#L53
https://github.com/llvm/llvm-project/blob/master/llvm/unittests/Support/JSONTest.cpp#L240
I think we should update cmake rules to explicitly tell the compilers that the source code language is utf8.
https://docs.microsoft.com/en-us/cpp/build/reference/utf-8-set-source-and-executable-character-sets-to-utf-8?view=vs-2019 seems to be the way for MSVC.
I am not sure if there's a way to change that for clang/gcc. I I believe they both require plain ascii or utf-8 anyways.
Using explicit UTF-8 string literals is a possible solution, but it makes the code a bit less readable. Another possible solution is to save the source file using UTF-8 with BOM, but this is confusing outside the Microsoft world (and it's very easy to remove the BOM by mistake).
I think passing /utf-8 to MSVC is the best solution for good interoperability with Clang.
I am not sure if there's a way to change that for clang/gcc. I I believe they both require plain ascii or utf-8 anyways.
Clang enforces UTF-8 everywhere so there's no need for additional configuration. Clang can also accept source files encoded in UTF-8 with BOM. I'm not sure about GCC, I think you need to enforce the encoding manually like in MSVC (see the -finput-charset and -fexec-charset options).
I recommend reading this article about how MSVC interprets the encoding of source files, the casuistic is a bit complex: https://devblogs.microsoft.com/cppblog/new-options-for-managing-character-sets-in-the-microsoft-cc-compiler/
This isn't the only place where we use such characters. While I'm not sure spelling these literally is a hill we want to die on, I think u8 literals are so poorly understood that trying to do this consistently isn't going to be that simple.
As noted, clang is always reading as utf-8. gcc is more complicated but we have no reports of this problem. So I think we should assume this is msvc-only and if we can find a CMake workaround we should take it.
Hmm, I see. From the looks of it, the solution for several projects would be
add_compile_options("$<$<C_COMPILER_ID:MSVC>:/utf-8>") add_compile_options("$<$<CXX_COMPILER_ID:MSVC>:/utf-8>")
But I'm not sure if it makes sense in our case and I don't see many add_compile_options in LLVM. Also, I don't have a way to test it out.
- I brought up the environment to reproduce this problem and can confirm, that this solution with add_compile_options works.
- Adding -DCMAKE_CXX_FLAGS="/utf-8" -DCMAKE_C_FLAGS="/utf-8" into cmake command also helps (so, maybe we can update build documentation for MSVC (here? https://llvm.org/docs/CMake.html#microsoft-visual-c ) instead of changes inside llvm/CMakeLists.txt)
llvm/CMakeLists.txt | ||
---|---|---|
604 ↗ | (On Diff #303754) | I think we could replace these two lines with the one line add_compile_options(/utf-8) Because we could not get here with non-MSVC compiler |
Sigh, this doesn't work because we have things like https://docs.microsoft.com/en-us/windows/win32/api/traceloggingprovider/nf-traceloggingprovider-tracelogging_define_provider and https://docs.microsoft.com/en-us/windows/win32/tracelogging/tracelogging-native-quick-start in Windows sanitizer code:
http://lab.llvm.org:8011/#/builders/127/builds/1161/steps/4/logs/stdio
I have double checked that I do not have such errors with Visual Studio 2019 (just built compiler-rt with your patch)
I am not sure is it a bug of VS 2017 or not, but according to official documentation execution-character-set is obsolete since VS 2015 update 2 ( https://docs.microsoft.com/en-us/cpp/preprocessor/execution-character-set?view=msvc-160 ), so it's strange to me why it's still in use inside VS 2017 system headers.