[Support] Fix formatted_raw_ostream for UTF-8

Authored by ostannard on Mar 17 2020, 7:13 AM.


[Support] Fix formatted_raw_ostream for UTF-8

  • The getLine and getColumn functions need to update the position, or they will return stale data for buffered streams. This fixes a bug in the clang -analyzer-checker-option-help option, which was not wrapping the help text correctly when stdout is not a TTY.
  • If the stream contains multi-byte UTF-8 sequences, then the whole sequence needs to be considered to be a single character. This has the edge case that the buffer might fill up and be flushed part way through a character.
  • If the stream contains East Asian wide characters, these will be rendered twice as wide as other characters, so we need to increase the column count to match.

This doesn't attempt to handle everything unicode can do (combining
characters, right-to-left markers, ...), but hopefully covers most
things likely to be common in messages and source code we might want to

Differential revision: https://reviews.llvm.org/D76291