This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Lex/
-
Lex/
-
LiteralSupport.cpp
-
test/Lexer/
-
Lexer/
-
cxx1y_digit_separators.cpp

Differential D41834

[Lex] Fix handling numerical literals ending with ' and signed exponent.
ClosedPublic

Authored by vsapsai on Jan 8 2018, 1:03 PM.

Download Raw Diff

Details

Reviewers

rsmith
t.p.northover

Commits

rGf7d393ccb138: Merging r324419: --------------------------------------------------------------…
rG579f0b307c19: [Lex] Fix handling numerical literals ending with ' and signed exponent.
rL324579: Merging r324419:
rC324419: [Lex] Fix handling numerical literals ending with ' and signed exponent.
rL324419: [Lex] Fix handling numerical literals ending with ' and signed exponent.

Summary

For input 0'e+1 lexer tokenized as numeric constant only 0'e. Later
NumericLiteralParser skipped 0 and ' as digits and parsed e+1 as valid
exponent going past the end of the token. Because it didn't mark numeric
literal as having an error, it continued parsing and tried to expandUCNs
with StringRef of length -2.

The fix is not to parse exponent when we reached the end of token.

Discovered by OSS-Fuzz:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=4588

rdar://problem/36076719

Diff Detail

Build Status

Buildable 13593
Build 13593: arc lint + arc unit

Event Timeline

vsapsai created this revision.Jan 8 2018, 1:03 PM

Harbormaster completed remote builds in B13593: Diff 128972.Jan 8 2018, 1:03 PM

This fixes the OSS-Fuzz bug but I don't know if it is sufficient. Should I also make Lexer::LexNumericConstant to include +1 part as tok::numeric_constant?

The lexer is doing the right thing; per the C++ lexical rules, the +1 is not part of the token in this case.

I don't think this fix is in the right place; we will still examine characters after the end of the literal, even with this applied, and that doesn't seem right to me (even though the literal parser is constructed in such a way that it is valid to do so, as long as it doesn't read past a nul byte). It looks like the problem is that NumericLiteralParser::ParseDecimalOrOctalCommon will examine *s in cases where it might point past the end of the literal; changing

if (*s == '+' || *s == '-')  s++; // sign

if (s != ThisTokEnd && (*s == '+' || *s == '-'))  s++; // sign

would seem appropriate. But I think I'd be most in favor of that change plus your change plus a change to suppress the "no digits in suffix" error if we've already had an error. Seem reasonable?

Yep, the plan sounds reasonable. I also noticed that we have

if (*s == '+' || *s == '-')  s++; // sign

code in NumericLiteralParser::ParseNumberStartingWithZero too. I plan to make the same change for hexadecimal numbers and check the behaviour in debugger.

Don't parse exponent past the end of token, add same test+fix for hexadecimal numbers.

I've addressed all known issues, please take another look.

Checked that suggested change also fixes another OSS-Fuzz bug https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=4664

vsapsai edited the summary of this revision. (Show Details)Feb 1 2018, 11:54 AM

Herald added a subscriber: jkorous-apple. · View Herald TranscriptFeb 1 2018, 11:54 AM

Ping.

LGTM, thanks!

This revision is now accepted and ready to land.Feb 6 2018, 1:22 PM

Thanks for the review.

Closed by commit rL324419: [Lex] Fix handling numerical literals ending with ' and signed exponent. (authored by vsapsai). · Explain WhyFeb 6 2018, 2:43 PM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptFeb 6 2018, 2:43 PM

Closed by commit rC324419: [Lex] Fix handling numerical literals ending with ' and signed exponent. (authored by vsapsai). · Explain WhyFeb 6 2018, 2:43 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

clang/

lib/

Lex/

LiteralSupport.cpp

4 lines

test/

Lexer/

cxx1y_digit_separators.cpp

1 line

Diff 128972

clang/lib/Lex/LiteralSupport.cpp

Show First 20 Lines • Show All 781 Lines • ▼ Show 20 Lines	void NumericLiteralParser::checkSeparator(SourceLocation TokLoc,
CheckSeparatorKind IsAfterDigits) {		CheckSeparatorKind IsAfterDigits) {
if (IsAfterDigits == CSK_AfterDigits) {		if (IsAfterDigits == CSK_AfterDigits) {
if (Pos == ThisTokBegin)		if (Pos == ThisTokBegin)
return;		return;
--Pos;		--Pos;
} else if (Pos == ThisTokEnd)		} else if (Pos == ThisTokEnd)
return;		return;

if (isDigitSeparator(*Pos))		if (isDigitSeparator(*Pos)) {
PP.Diag(PP.AdvanceToTokenCharacter(TokLoc, Pos - ThisTokBegin),		PP.Diag(PP.AdvanceToTokenCharacter(TokLoc, Pos - ThisTokBegin),
diag::err_digit_separator_not_between_digits)		diag::err_digit_separator_not_between_digits)
<< IsAfterDigits;		<< IsAfterDigits;
		hadError = true;
		}
}		}

/// ParseNumberStartingWithZero - This method is called when the first character		/// ParseNumberStartingWithZero - This method is called when the first character
/// of the number is found to be a zero. This means it is either an octal		/// of the number is found to be a zero. This means it is either an octal
/// number (like '04') or a hex number ('0x123a') a binary number ('0b1010') or		/// number (like '04') or a hex number ('0x123a') a binary number ('0b1010') or
/// a floating point number (01239.123e4). Eat the prefix, determining the		/// a floating point number (01239.123e4). Eat the prefix, determining the
/// radix etc.		/// radix etc.
void NumericLiteralParser::ParseNumberStartingWithZero(SourceLocation TokLoc) {		void NumericLiteralParser::ParseNumberStartingWithZero(SourceLocation TokLoc) {
▲ Show 20 Lines • Show All 927 Lines • Show Last 20 Lines

clang/test/Lexer/cxx1y_digit_separators.cpp

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	namespace floating {
float p = 0'e1; // expected-error {{digit separator cannot appear at end of digit sequence}}		float p = 0'e1; // expected-error {{digit separator cannot appear at end of digit sequence}}
float q = 0'0e1;		float q = 0'0e1;
float r = 0.'0e1; // expected-error {{digit separator cannot appear at start of digit sequence}}		float r = 0.'0e1; // expected-error {{digit separator cannot appear at start of digit sequence}}
float s = 0.0'e1; // expected-error {{digit separator cannot appear at end of digit sequence}}		float s = 0.0'e1; // expected-error {{digit separator cannot appear at end of digit sequence}}
float t = 0.0e'1; // expected-error {{digit separator cannot appear at start of digit sequence}}		float t = 0.0e'1; // expected-error {{digit separator cannot appear at start of digit sequence}}
float u = 0x.'p1f; // expected-error {{hexadecimal floating literal requires a significand}}		float u = 0x.'p1f; // expected-error {{hexadecimal floating literal requires a significand}}
float v = 0e'f; // expected-error {{exponent has no digits}}		float v = 0e'f; // expected-error {{exponent has no digits}}
float w = 0x0p'f; // expected-error {{exponent has no digits}}		float w = 0x0p'f; // expected-error {{exponent has no digits}}
		float x = 0'e+1; // expected-error {{digit separator cannot appear at end of digit sequence}}
}		}

#line 123'456		#line 123'456
static_assert(__LINE__ == 123456, "");		static_assert(__LINE__ == 123456, "");

// x has value 0 in C++11 and 34 in C++1y.		// x has value 0 in C++11 and 34 in C++1y.
#define M(x, ...) __VA_ARGS__		#define M(x, ...) __VA_ARGS__
constexpr int x = { M(1'2,3'4) };		constexpr int x = { M(1'2,3'4) };
Show All 21 Lines