I don't love this change, but it prevents crashing when indexing boost headers,
and I can't think of a better practical alternative.
Based on a patch by AnakinZheng!
Differential D81530
[clangd] Log rather than assert on bad UTF-8. sammccall on Jun 9 2020, 9:41 PM. Authored by
Details
I don't love this change, but it prevents crashing when indexing boost headers, Based on a patch by AnakinZheng!
Diff Detail
Event TimelineComment Actions LGTM, thanks! (i bet you rushed it already for the cherry-pick, but just wanted to remind again that we should :D) Comment Actions I can confirm that this commit works equally well for the UTF-8 assertion failure. Thank you! Is there some way to easily modify the source to produce a diagnostic pointing to the issue's source location? I would like to file an appropriate bug in Boost. Comment Actions In https://www.boost.org/doc/libs/1_73_0/boost/spirit/home/support/char_encoding/iso8859_1.hpp It's not a cut and dried bug, but I do think removing the high bytes from these comments is a good idea, so it's probably worth filing the bug. The C++ standard doesn't say anything about the encoding of characters on disk (the "input encoding" of bytes -> source character set) - it starts with the source character set, I think. So what do implementations do?
In practice this doesn't actually affect compilation because the bad UTF-8 sequence in the comment is never parsed: clang just skips over it looking for */. It probably mostly affects tools that use line/column coordinates (like clangd by virtue of LSP, and clang diagnostics), and tools that extract comment contents (doxygen et al). Comment Actions Thanks for the detailed analysis! I have filed https://github.com/boostorg/spirit/issues/612 |