This is an archive of the discontinued LLVM Phabricator instance.

YAML parser robustness improvements
ClosedPublic

Authored by thomasfinch on May 6 2019, 1:18 PM.

Details

Summary

This patch fixes a number of bugs found in the YAML parser through fuzzing. In general, this makes the parser more robust against malformed inputs.

The fixes are mostly improved null checking and returning errors in more cases. In some cases, asserts were changed to regular errors, this provides the same robustness but also protects release builds from the triggering conditions. This also improves the fuzzability of the YAML parser since asserts can act as a roadblock to further fuzzing once they're hit.

Each fix has a corresponding test case:

  • TestAnchorMapError - Added proper null pointer handling in Stream::printError if N is null and KeyValueNode::getValue if getKey returns null, Input::createHNodes dyn_casts changed to dyn_cast_or_null so the null pointer checks are actually able to fail
  • TestFlowSequenceTokenErrors - Added case in Document::parseBlockNode for FlowMappingEnd, FlowSequenceEnd, or FlowEntry tokens outside of mappings or sequences
  • TestDirectiveMappingNoValue - Changed assert to regular error return in Scanner::scanValue
  • TestUnescapeInfiniteLoop - Fixed infinite loop in ScalarNode::unescapeDoubleQuoted by returning an error for unrecognized escape codes
  • TestScannerUnexpectedCharacter - Changed asserts to regular error returns in Scanner::consume
  • TestUnknownDirective - For both of the inputs the stream doesn't fail and correctly returns TK_Error, but there is no valid root node for the document. There's no reasonable way to make the scanner fail for unknown directives without breaking the YAML spec (see spec-07-01.test). I think the assert is unnecessary given that an error is still generated for this case.

The SimpleKeys.clear() line fixes a bug found by AddressSanitizer triggered by multiple test cases - when TokenQueue is cleared SimpleKeys is still holding dangling pointers into it, so SimpleKeys should be cleared as well.

Event Timeline

thomasfinch created this revision.May 6 2019, 1:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 6 2019, 1:18 PM
thomasfinch edited the summary of this revision. (Show Details)May 6 2019, 1:23 PM
thomasfinch added a reviewer: chandlerc.
Bigcheese requested changes to this revision.May 29 2019, 3:36 PM

Thanks for finding this!

I don't think this is the correct fix though, as all of the provided inputs should emit an error. I believe the issue is that parseBlockNode is missing a case for unexpected tokens. Let me know if you need any more info or if you want me to take a look at fixing it.

This revision now requires changes to proceed.May 29 2019, 3:36 PM
thomasfinch updated this revision to Diff 203901.EditedJun 10 2019, 2:43 PM

Move error checking to Document::parseBlockNode, make sure Document::skip doesn't crash if getRoot returns null.

thomasfinch retitled this revision from Fix YAML parser's Document::skip for null nodes to YAML parser robustness improvements.
thomasfinch edited the summary of this revision. (Show Details)

Split test cases apart, added a few more bug fixes

Bigcheese added inline comments.Sep 5 2019, 2:28 PM
lib/Support/YAMLParser.cpp
937 ↗(On Diff #218944)

This error and the one below should be something closer to: "Unable to parse non-ascii here".

1235–1238 ↗(On Diff #218944)

I realize you changed this so fuzzing works, but this isn't how this case should be handled. setError should be for user error, this is a programming error as the situation should never happen.

1949 ↗(On Diff #218944)

We shouldn't use ! in error messages.

Also, please use -U999999 for patches so there's full context.

Update diff based on feedback.

  • Update error message in Scanner::consume
  • Update error message in ScalarNode::unescapeDoubleQuoted
  • Remove setError call when failing to find a SimpleKey in the token queue

Remove trailing spaces in KeyValueNode::getValue

Bigcheese accepted this revision.Oct 30 2019, 2:18 PM

lgtm, sorry for the delay, I didn't see the updates for some reason.

This revision is now accepted and ready to land.Oct 30 2019, 2:18 PM

@Bigcheese Thanks! Could you please merge this, I don't have commit access.

llvm/lib/Support/YAMLParser.cpp
2401

@Bigcheese should I be calling setError here as well?

Rebasing on monorepo

hintonda requested changes to this revision.Nov 4 2019, 4:14 PM
hintonda added inline comments.
llvm/lib/Support/YAMLTraits.cpp
373

Since createHNodes is an internal function and only ever called 3 times within Input:: (and nowhere else), and N is already checked before each of those calls, there's no need add _of_null to these dyn_cast calls and check for null again each time.

To enforce this, I'd suggest you make createHNodes and setError private which will make the class more robust, e.g.:

index f365d60b3ff..fe2fe8c56da 100644
--- a/llvm/include/llvm/Support/YAMLTraits.h
+++ b/llvm/include/llvm/Support/YAMLTraits.h
@@ -1509,6 +1509,7 @@ private:
     std::vector<std::unique_ptr<HNode>> Entries;
   };

+private:
   std::unique_ptr<Input::HNode> createHNodes(Node *node);
   void setError(HNode *hnode, const Twine &message);
   void setError(Node *node, const Twine &message);

However, KeyNode on line 396 could be null, so either checking or adding _or_null to the call is definitely needed there.

This revision now requires changes to proceed.Nov 4 2019, 4:14 PM
  • Revert uses of dyn_cast_or_null
hintonda accepted this revision.Nov 4 2019, 6:40 PM

Great, LGTM..

I'll go ahead and land this for you tomorrow unless @Bigcheese has any objections.

This revision is now accepted and ready to land.Nov 4 2019, 6:40 PM
This revision was automatically updated to reflect the committed changes.