This is an archive of the discontinued LLVM Phabricator instance.

[YAML] Escape non-printable multibyte UTF8 in Output::scalarString.
ClosedPublic

Authored by graydon on Mar 23 2018, 10:02 PM.

Details

Summary

The existing YAML Output::scalarString code path includes a partial and
incorrect implementation of YAML escaping logic. In particular, the logic put
in place in rL321283 escapes non-printable bytes only if they are not part of a
multibyte UTF8 sequence; implicitly this means that all multibyte UTF8
sequences -- printable and non -- are passed through verbatim.

The simplest solution to this is to direct the Output::scalarString method to
use the standalone yaml::escape function, and this _almost_ works, except that
the existing code in that function _over_ escapes: any multibyte UTF8 sequence
is escaped, even printable ones. While this is permitted for YAML, it is also
more aggressive (and hard to read for non-English locales) than necessary,
and the entire point of rL321283 was to back off such aggressive over-escaping.

So in this change, I have both redirected Output::scalarString to use
yaml::escape _and_ modified yaml::escape to optionally restrict its escaping to
non-printables. This preserves behaviour of any existing clients while giving
them a path to more moderate escaping should they desire.

Diff Detail

Repository
rL LLVM

Event Timeline

graydon created this revision.Mar 23 2018, 10:02 PM
thegameg accepted this revision.Mar 24 2018, 2:12 PM

LGTM. Thanks for fixing this the right way, it looks much better now!

This revision is now accepted and ready to land.Mar 24 2018, 2:12 PM
This revision was automatically updated to reflect the committed changes.
graydon retitled this revision from Summary: to [YAML] Escape non-printable multibyte UTF8 in Output::scalarString..Mar 27 2018, 12:58 PM