This is an archive of the discontinued LLVM Phabricator instance.

encode YAML scalars using double quotes so all values can be expressed
AbandonedPublic

Authored by inglorion on Jun 6 2017, 2:47 PM.

Details

Summary

Previously, our YAML encoder used single quotes to encode scalars that
cannot be expressed as plain scalars. Single-quoted scalars can encode
some values that plain scalars cannot, but fall short of being able to
encode all scalars. This change makes the YAML encoder use
double-quoted scalars instead, which can encode all character values.

Event Timeline

inglorion created this revision.Jun 6 2017, 2:47 PM

The vast majority of changes here are updating tests to expect double quotes instead of single quotes. The actual change to the encoder is in llvm/lib/Support/YAMLTraits.cpp.

Alternative considered: Use single quotes when sufficient, and double quotes only if necessary. This would complicate the encoder somewhat, because there is an additional decision that needs to be made, and possibly some work to be re-done (e.g. if a double quote has already been included in the string). On the plus side, it would not require all the tests to be updated. In the end, I decided that I prefer to have a single quoting style used in every case, rather than using a not fully expressive quoting style in the common case and a different quoting style in rarer cases. The test changes only have to be made once, and are mostly mechanical.

zturner edited edge metadata.Jun 6 2017, 3:06 PM

I know you're working on this because of some issues with a certain PDB symbol. Any chance you could send me the PDB that is triggering the problem?

zturner added inline comments.Jun 7 2017, 3:52 PM
llvm/lib/Support/YAMLTraits.cpp
624

Is there any reason not to use isprint here (the semantics of isprint are slightly different than your if-case, btw)

625–628

You could use Encoded += "\\x" + toHex(StringRef(&C, 1)); here.

630

It looks like the previous code did not escape backslashes. Was this a bug in the previous code? If so we should add a test for it.

inglorion added inline comments.Jun 7 2017, 4:49 PM
llvm/lib/Support/YAMLTraits.cpp
624

Not really. I just don't know off the top of my head if isprint checks the same thing as YAML's "printable character". For characters with code points >=0 and <= 127, it seems like it does. Since I'm not handling Unicode here, I could presumably replace the check here with isprint. Would you like me to do that?

625–628

Will do.

630

The old code didn't escape backslashes, which is actually correct - single quotes in YAML don't support any escape sequences, with the exception of two single quotes to encode a single single quote. Backslashes are used in some of the tests, so we know they are handled correctly both before and after this change, but I'll add an explicit test in YAMLIOTest (where I've already added tests for control characters and characters we expect to appear literally).

inglorion updated this revision to Diff 102077.Jun 9 2017, 2:55 PM
inglorion marked 3 inline comments as done.

Implemented @zturner's suggestions

zturner: Do you have some time to look at this (D33961) and D34013?

This looks ok to me, but +joerg since there was another patch recently related to quoting and escaping of yaml, and he had some thoughts there. So just want to make sure this doesn't run contrary to any of the ideas he expressed in the other revision.

inglorion abandoned this revision.May 30 2019, 1:58 PM

I think r320996 implements this.