This patch fixes a number of bugs found in the YAML parser through fuzzing. In general, this makes the parser more robust against malformed inputs.
The fixes are mostly improved null checking and returning errors in more cases. In some cases, asserts were changed to regular errors, this provides the same robustness but also protects release builds from the triggering conditions. This also improves the fuzzability of the YAML parser since asserts can act as a roadblock to further fuzzing once they're hit.
Each fix has a corresponding test case:
- TestAnchorMapError - Added proper null pointer handling in Stream::printError if N is null and KeyValueNode::getValue if getKey returns null, Input::createHNodes dyn_casts changed to dyn_cast_or_null so the null pointer checks are actually able to fail
- TestFlowSequenceTokenErrors - Added case in Document::parseBlockNode for FlowMappingEnd, FlowSequenceEnd, or FlowEntry tokens outside of mappings or sequences
- TestDirectiveMappingNoValue - Changed assert to regular error return in Scanner::scanValue
- TestUnescapeInfiniteLoop - Fixed infinite loop in ScalarNode::unescapeDoubleQuoted by returning an error for unrecognized escape codes
- TestScannerUnexpectedCharacter - Changed asserts to regular error returns in Scanner::consume
- TestUnknownDirective - For both of the inputs the stream doesn't fail and correctly returns TK_Error, but there is no valid root node for the document. There's no reasonable way to make the scanner fail for unknown directives without breaking the YAML spec (see spec-07-01.test). I think the assert is unnecessary given that an error is still generated for this case.
The SimpleKeys.clear() line fixes a bug found by AddressSanitizer triggered by multiple test cases - when TokenQueue is cleared SimpleKeys is still holding dangling pointers into it, so SimpleKeys should be cleared as well.
This error and the one below should be something closer to: "Unable to parse non-ascii here".