Page MenuHomePhabricator

[yaml][clang-tidy] Fix multiline YAML serialization
ClosedPublic

Authored by DmitryPolukhin on May 20 2020, 8:23 AM.

Details

Summary

Newline duplication logic introduced in https://reviews.llvm.org/D63482
has two issues: (1) there is no logic that removes duplicate newlines
when clang-apply-replacment reads YAML and (2) in general such logic
should be applied to all strings and should happen on string
serialization level instead in YAML parser.

This diff changes multiline strings quotation from single quote ' to
double ". It solves problems with internal newlines because now they are
escaped. Also double quotation solves the problem with leading whitespace after
newline. In case of single quotation YAML parsers should remove leading
whitespace according to specification. In case of double quotation these
leading are internal space and they are preserved. There is no way to
instruct YAML parsers to preserve leading whitespaces after newline so
double quotation is the only viable option that solves all problems at
once.

Diff Detail

Event Timeline

DmitryPolukhin created this revision.May 20 2020, 8:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2020, 8:23 AM
DmitryPolukhin edited the summary of this revision. (Show Details)May 20 2020, 8:25 AM

Fix clang-tidy warnings

Fixed second string after new line + more test case

Thanks for doing this! I didn't work on the YAML parser before, so I cannot give formal approval.

llvm/lib/Support/YAMLTraits.cpp
888

I wonder whether using StringRef::split() would lead to an easier implementation (https://llvm.org/doxygen/classllvm_1_1StringRef.html#af0284e4c41c0e09c0bc4767bc77a899d)

DmitryPolukhin marked an inline comment as done.

+ @gribozavr, @Eugene.Zelenko, @thegameg who touched/reviewed this code, please take a look.

llvm/lib/Support/YAMLTraits.cpp
888

I'm not sure that it will be easier to read or more efficient (StringRef::split will require additional vector).

mgehre added inline comments.Jun 2 2020, 11:08 AM
llvm/lib/Support/YAMLTraits.cpp
888

I don't expect the difference in efficiency to be noticable, but the code would look like

SmallVector<StringRef, 2> Lines;
StringRef(Val).split(Lines, '\n');
bool First = true;
for (StringRef Line: Lines) {
  if (First)
    First = false;
  else
    Out << '\n';
  Out << Line;
}

which I personally find easier to follow than CurrentPos, LineBreakPos and their arithmetic.

Apply suggested changes with string split

DmitryPolukhin marked an inline comment as done.Jun 4 2020, 1:45 PM
njames93 added inline comments.Jun 4 2020, 3:15 PM
llvm/lib/Support/YAMLTraits.cpp
888

If you want to do the same here...

SmallVector<StringRef, 8> Lines;
Scalar.split(Lines, "\n\n");
Val = llvm::join(Lines, "\n");
return StringRef();
DmitryPolukhin marked an inline comment as done.

Rewrite input string split too

DmitryPolukhin marked 2 inline comments as done.

Fix spelling

DmitryPolukhin added inline comments.Jun 5 2020, 4:22 AM
llvm/lib/Support/YAMLTraits.cpp
888

Personally I don't like approach with string split because it is potentially less efficient due to memory allocations in case of multiple newlines that can be avoided. BUT I would like to move the needle and fix this real bug in clang ASAP. So I'm ready to rewrite it to something that (1) works and (2) moves review forward. So thank you for review and please take another look.

Fix single new line handling, it should be replace with space

@njames93 - friendly ping, could you please take another look.

It looks like there is no support for the proposed solution so I found alternative solution that might be even better. We can use double quotation " for multiline strings. It solves problem because in case of double quotation LLVM escapes new line like \n so there is no need to double newlines. Escaped newlines can be parsed correctly by other YAML parsers like pyyaml. BTW, LLVM YAML reading also has issue with not removing leading spaces for multiline strings so multiline strings serialised by pyyaml with single quotation cannot be parsed correctly by clang-apply-replacements. With double quotation it seems to work fine in both directions.

What do you think about using double quotation for multiline strings?

Use double quotation for multiline strings. It solves problems with internal newlines because now they are escaped. Also double quotation solves the problem with leading whitespace after newline. In case of single quotation YAML parsers should remove leading whitespace according to specification. In case of double quotation these leading are internal space and they are preserved. There is no way to instruct YAML parsers to preserve leading whitespaces after newline so double quotation is the only viable option that solves all problems at once.

DmitryPolukhin retitled this revision from [yaml][clang-tidy] Fix new line YAML serialization to [yaml][clang-tidy] Fix multiline YAML serialization.Jun 24 2020, 3:09 AM
DmitryPolukhin edited the summary of this revision. (Show Details)
DmitryPolukhin edited the summary of this revision. (Show Details)

@njames93 and @aaron.ballman - please take a look to this diff. Multiline replacements in YAML are broken and cannot be applied correctly.

aaron.ballman accepted this revision.Jul 7 2020, 7:07 AM

LGTM, this seems like a pretty clean solution. Thanks!

This revision is now accepted and ready to land.Jul 7 2020, 7:07 AM
This revision was automatically updated to reflect the committed changes.

@aaron.ballman - thank you for the review!