This is an archive of the discontinued LLVM Phabricator instance.

[YAML] Recover gracefully when deserializing invalid YAML input.

Authored by tulloch on Jul 30 2013, 3:37 AM.


  1. See for a user's bug report.
  2. We weren't checking for Document::getRoot() returning NULL. This happens

in the case where the input YAML/JSON object is malformed.

  1. Document::getRoot() is passed to isa<NullNode>, which asserts on it's

argument being NULL.

  1. To fix this, we check for this return value, and set the error code

accordingly. We also modify the calling code to check the return value
of input::setCurrentDocument().

The added unit tests hit the assert in trunk, while passing on this branch.

Additionally, the command that reproduces this assert in the bug report:

llvm-build/Debug+Asserts/bin/clang-format  -style="{BasedOnStyle:llvm}" tools/clang/tools/clang-format/ClangFormat.cpp

now outputs

Diff Detail

Event Timeline

tulloch updated this revision to Unknown Object (????).Jul 30 2013, 5:45 PM

Removed braces around single-line conditional block.



Wouldn't this whole body be simpler as:

if (yin.setCurrentDocument())
  yamlize(yin, docMap, true);
return yin;
tulloch added inline comments.Jul 31 2013, 11:10 AM

Yes, that would be simpler. I'll update.


This call can potentially output errors without allowing us to set the DiagHandler.


This unfortunately outputs an error message into the unit test logs, since Document.begin() is called before we setDiagHandler.

We can convert it to

a) optionally pass DiagHandler in the constructor,
b) lazily initialize the DocIterator,
c) explicitly initialize Input, etc.
d) just ignore it.

Any thoughts on this?

tulloch updated this revision to Unknown Object (????).Aug 1 2013, 10:27 PM

Cleaned up control flow somewhat.

Does this handle empty input well ( If so, please add a test.

Plus a few comments inline.


nit: add a space after the comma.


Option a) seems like the best solution here.

tulloch planned changes to this revision.Nov 13 2013, 9:06 AM

@alexfh, thanks for the comments (and the pointer to the other bug report). Will update the diff with your suggested changes.


Sure, good catch.

tulloch updated this revision to Unknown Object (????).Nov 15 2013, 5:51 AM

Updated with reviewer comments, and added unit tests covering the empty string for the sequence/map with {required, optional} keys cases.

tulloch updated this revision to Unknown Object (????).Nov 15 2013, 6:24 AM

Fixed spelling error in unit test naming.

Thanks for the updated patch! It looks good overall. Please see a few comments inline.


nit: Please add spaces around assignments.

I'd also rename Handler to DiagHandler and rephrase the comment:
"The DiagHandler can be specified to provide alternative error reporting."


I don't see any need in this method, as it's only used directly after constructor (which can be wrong, if the constructor call generates errors on its own). So I suggest removing it.

I only could find usages of this method in two files:

(the second one resides in a different branch of the repository, and can be updated separately after this patch gets in).


Is CurrentNode NULL only when the document is empty? Can it be NULL when parsing invalid documents?


nit: please put a period in the end of the comment.


nit: in LLVM it's more common to write:

tulloch updated this revision to Unknown Object (????).Nov 17 2013, 3:45 AM

Updated with reviewer comments.

tulloch added inline comments.Nov 17 2013, 6:18 AM

It would be null in the case where the document is invalid, yes. In that case we would have set EC in setCurrentDocument(), and so wouldn't have reached this part of the method.

Would you like this comment to be clarified?

Looks good with one change in a comment.

Do you commit access? If not, I can commit the patch for you (and make the change in the comment myself).


Yes, the comment needs to be clarified. Mostly by removing the description of _what_ we're doing, and leaving only _why_, e.g. "CurrentNode is null for empty documents, which is an error in the presence of required keys." or something similar. What do you think?

@alexfh - thanks for reviewing the patch. I don't have commit access, and feel free to commit the patch (with whatever changes you feel like - the rewording you proposed is much clearer).




I think your wording is much clearer, thanks!

alexfh accepted this revision.Nov 18 2013, 7:31 AM
alexfh closed this revision.Nov 18 2013, 7:56 AM

Committed revision 195016.