Previously, our YAML encoder used single quotes to encode scalars that
cannot be expressed as plain scalars. Single-quoted scalars can encode
some values that plain scalars cannot, but fall short of being able to
encode all scalars. This change makes the YAML encoder use
double-quoted scalars instead, which can encode all character values.
Details
Diff Detail
- Build Status
Buildable 7130 Build 7130: arc lint + arc unit
Event Timeline
The vast majority of changes here are updating tests to expect double quotes instead of single quotes. The actual change to the encoder is in llvm/lib/Support/YAMLTraits.cpp.
Alternative considered: Use single quotes when sufficient, and double quotes only if necessary. This would complicate the encoder somewhat, because there is an additional decision that needs to be made, and possibly some work to be re-done (e.g. if a double quote has already been included in the string). On the plus side, it would not require all the tests to be updated. In the end, I decided that I prefer to have a single quoting style used in every case, rather than using a not fully expressive quoting style in the common case and a different quoting style in rarer cases. The test changes only have to be made once, and are mostly mechanical.
I know you're working on this because of some issues with a certain PDB symbol. Any chance you could send me the PDB that is triggering the problem?
llvm/lib/Support/YAMLTraits.cpp | ||
---|---|---|
624 | Is there any reason not to use isprint here (the semantics of isprint are slightly different than your if-case, btw) | |
625–628 | You could use Encoded += "\\x" + toHex(StringRef(&C, 1)); here. | |
630 | It looks like the previous code did not escape backslashes. Was this a bug in the previous code? If so we should add a test for it. |
llvm/lib/Support/YAMLTraits.cpp | ||
---|---|---|
624 | Not really. I just don't know off the top of my head if isprint checks the same thing as YAML's "printable character". For characters with code points >=0 and <= 127, it seems like it does. Since I'm not handling Unicode here, I could presumably replace the check here with isprint. Would you like me to do that? | |
625–628 | Will do. | |
630 | The old code didn't escape backslashes, which is actually correct - single quotes in YAML don't support any escape sequences, with the exception of two single quotes to encode a single single quote. Backslashes are used in some of the tests, so we know they are handled correctly both before and after this change, but I'll add an explicit test in YAMLIOTest (where I've already added tests for control characters and characters we expect to appear literally). |
This looks ok to me, but +joerg since there was another patch recently related to quoting and escaping of yaml, and he had some thoughts there. So just want to make sure this doesn't run contrary to any of the ideas he expressed in the other revision.
Is there any reason not to use isprint here (the semantics of isprint are slightly different than your if-case, btw)