This is an archive of the discontinued LLVM Phabricator instance.

[Lex] Fix handling numerical literals ending with ' and signed exponent.
ClosedPublic

Authored by vsapsai on Jan 8 2018, 1:03 PM.

Details

Summary

For input 0'e+1 lexer tokenized as numeric constant only 0'e. Later
NumericLiteralParser skipped 0 and ' as digits and parsed e+1 as valid
exponent going past the end of the token. Because it didn't mark numeric
literal as having an error, it continued parsing and tried to expandUCNs
with StringRef of length -2.

The fix is not to parse exponent when we reached the end of token.

Discovered by OSS-Fuzz:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=4588

rdar://problem/36076719

Diff Detail

Repository
rC Clang

Event Timeline

vsapsai created this revision.Jan 8 2018, 1:03 PM

This fixes the OSS-Fuzz bug but I don't know if it is sufficient. Should I also make Lexer::LexNumericConstant to include +1 part as tok::numeric_constant?

rsmith added a comment.Jan 8 2018, 5:20 PM

The lexer is doing the right thing; per the C++ lexical rules, the +1 is not part of the token in this case.

I don't think this fix is in the right place; we will still examine characters after the end of the literal, even with this applied, and that doesn't seem right to me (even though the literal parser is constructed in such a way that it is valid to do so, as long as it doesn't read past a nul byte). It looks like the problem is that NumericLiteralParser::ParseDecimalOrOctalCommon will examine *s in cases where it might point past the end of the literal; changing

if (*s == '+' || *s == '-')  s++; // sign

to

if (s != ThisTokEnd && (*s == '+' || *s == '-'))  s++; // sign

would seem appropriate. But I think I'd be most in favor of that change plus your change plus a change to suppress the "no digits in suffix" error if we've already had an error. Seem reasonable?

vsapsai planned changes to this revision.Jan 8 2018, 5:57 PM

Yep, the plan sounds reasonable. I also noticed that we have

if (*s == '+' || *s == '-')  s++; // sign

code in NumericLiteralParser::ParseNumberStartingWithZero too. I plan to make the same change for hexadecimal numbers and check the behaviour in debugger.

vsapsai updated this revision to Diff 129125.Jan 9 2018, 10:46 AM
  • Don't parse exponent past the end of token, add same test+fix for hexadecimal numbers.

I've addressed all known issues, please take another look.

Checked that suggested change also fixes another OSS-Fuzz bug https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=4664

vsapsai edited the summary of this revision. (Show Details)Feb 1 2018, 11:54 AM
rsmith accepted this revision.Feb 6 2018, 1:22 PM

LGTM, thanks!

This revision is now accepted and ready to land.Feb 6 2018, 1:22 PM

Thanks for the review.

This revision was automatically updated to reflect the committed changes.
This revision was automatically updated to reflect the committed changes.