This is an archive of the discontinued LLVM Phabricator instance.

[YAML] Trim trailing whitespace from plain scalars
ClosedPublic

Authored by rkayaith on Oct 31 2022, 3:01 PM.

Details

Summary

In some cases plain scalars are currently parsed with a trailing
newline. In particular this shows up often when parsing JSON files, e.g.
note the \n after 456 below:

$ cat test.yaml
{
  "foo": 123,
  "bar": 456
}
$ yaml-bench test.yaml -canonical
%YAML 1.2
---
!!map {
  ? !!str "foo"
  : !!str "123",
  ? !!str "bar"
  : !!str "456\n",
}
...

The trailing whitespace ends up causing the conversion of the scalar to
int/bool/etc. to fail, causing the issue seen here:
https://github.com/llvm/llvm-project/issues/15877

From reading the YAML spec (https://yaml.org/spec/1.2.2/#733-plain-style)
it seems like plain scalars should never end with whitespace, so this
change trims all trailing whitespace characters from the
value (specifically b-line-feed, b-carriage-return, s-space, and
s-tab).

Diff Detail

Event Timeline

rkayaith created this revision.Oct 31 2022, 3:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 31 2022, 3:01 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
rkayaith added inline comments.Oct 31 2022, 7:14 PM
llvm/lib/Support/YAMLParser.cpp
2044–2049

An alternative to this might be to change the scanner to match the spec more closely, but I'm not very familiar with the spec or that code so I couldn't tell if that would be straightforward to do.

rkayaith published this revision for review.Oct 31 2022, 7:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 31 2022, 7:14 PM
rkayaith updated this revision to Diff 472303.Nov 1 2022, 7:56 AM
rkayaith edited the summary of this revision. (Show Details)

update summary

Adding reviewers based on recent commits

scott.linder accepted this revision.Feb 9 2023, 3:17 PM

Sorry for the delay in reviewing! LGTM with a couple nits

llvm/lib/Support/YAMLParser.cpp
2043–2044

Nit: I believe this is just a stale comment from the partial implementation of block scalars before https://reviews.llvm.org/D9503

Could you remove the mention of "block" while you're at it?

2044–2048

Could you mention the alternative of handling this in the scanner? I am not certain this is the right place for this, but if it is moving us closer to the spec I think it is OK regardless. The YAML spec is so vast and impenetrable that I don't suspect we will ever be fully compliant anyway :)

This revision is now accepted and ready to land.Feb 9 2023, 3:17 PM
rkayaith updated this revision to Diff 496279.Feb 9 2023, 4:30 PM
rkayaith marked 2 inline comments as done.

update comments

Sorry for the delay in reviewing!

No worries, thanks for the review!

This revision was automatically updated to reflect the committed changes.