This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Lex/
-
Lex/
-
LiteralSupport.cpp
-
test/Lexer/
-
Lexer/
-
cxx1y_digit_separators.cpp

Differential D41834

[Lex] Fix handling numerical literals ending with ' and signed exponent.
ClosedPublic

Authored by vsapsai on Jan 8 2018, 1:03 PM.

Download Raw Diff

Details

Reviewers

rsmith
t.p.northover

Commits

rGf7d393ccb138: Merging r324419: --------------------------------------------------------------…
rG579f0b307c19: [Lex] Fix handling numerical literals ending with ' and signed exponent.
rL324579: Merging r324419:
rC324419: [Lex] Fix handling numerical literals ending with ' and signed exponent.
rL324419: [Lex] Fix handling numerical literals ending with ' and signed exponent.

Summary

For input 0'e+1 lexer tokenized as numeric constant only 0'e. Later
NumericLiteralParser skipped 0 and ' as digits and parsed e+1 as valid
exponent going past the end of the token. Because it didn't mark numeric
literal as having an error, it continued parsing and tried to expandUCNs
with StringRef of length -2.

The fix is not to parse exponent when we reached the end of token.

Discovered by OSS-Fuzz:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=4588

rdar://problem/36076719

Diff Detail

Repository: rC Clang

Event Timeline

vsapsai created this revision.Jan 8 2018, 1:03 PM

Harbormaster completed remote builds in B13593: Diff 128972.Jan 8 2018, 1:03 PM

This fixes the OSS-Fuzz bug but I don't know if it is sufficient. Should I also make Lexer::LexNumericConstant to include +1 part as tok::numeric_constant?

The lexer is doing the right thing; per the C++ lexical rules, the +1 is not part of the token in this case.

I don't think this fix is in the right place; we will still examine characters after the end of the literal, even with this applied, and that doesn't seem right to me (even though the literal parser is constructed in such a way that it is valid to do so, as long as it doesn't read past a nul byte). It looks like the problem is that NumericLiteralParser::ParseDecimalOrOctalCommon will examine *s in cases where it might point past the end of the literal; changing

if (*s == '+' || *s == '-')  s++; // sign

if (s != ThisTokEnd && (*s == '+' || *s == '-'))  s++; // sign

would seem appropriate. But I think I'd be most in favor of that change plus your change plus a change to suppress the "no digits in suffix" error if we've already had an error. Seem reasonable?

Yep, the plan sounds reasonable. I also noticed that we have

if (*s == '+' || *s == '-')  s++; // sign

code in NumericLiteralParser::ParseNumberStartingWithZero too. I plan to make the same change for hexadecimal numbers and check the behaviour in debugger.

Don't parse exponent past the end of token, add same test+fix for hexadecimal numbers.

I've addressed all known issues, please take another look.

Checked that suggested change also fixes another OSS-Fuzz bug https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=4664

vsapsai edited the summary of this revision. (Show Details)Feb 1 2018, 11:54 AM

Herald added a subscriber: jkorous-apple. · View Herald TranscriptFeb 1 2018, 11:54 AM

Ping.

LGTM, thanks!

This revision is now accepted and ready to land.Feb 6 2018, 1:22 PM

Thanks for the review.

Closed by commit rL324419: [Lex] Fix handling numerical literals ending with ' and signed exponent. (authored by vsapsai). · Explain WhyFeb 6 2018, 2:43 PM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptFeb 6 2018, 2:43 PM

Closed by commit rC324419: [Lex] Fix handling numerical literals ending with ' and signed exponent. (authored by vsapsai). · Explain WhyFeb 6 2018, 2:43 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Lex/

LiteralSupport.cpp

24 lines

test/

Lexer/

cxx1y_digit_separators.cpp

2 lines

Diff 133085

lib/Lex/LiteralSupport.cpp

Show First 20 Lines • Show All 732 Lines • ▼ Show 20 Lines	if (*s == '.') {
s = SkipDigits(s); // Skip suffix.		s = SkipDigits(s); // Skip suffix.
}		}
if (s == 'e' \|\| s == 'E') { // exponent		if (s == 'e' \|\| s == 'E') { // exponent
checkSeparator(TokLoc, s, CSK_AfterDigits);		checkSeparator(TokLoc, s, CSK_AfterDigits);
const char *Exponent = s;		const char *Exponent = s;
s++;		s++;
radix = 10;		radix = 10;
saw_exponent = true;		saw_exponent = true;
if (s == '+' \|\| s == '-') s++; // sign		if (s != ThisTokEnd && (s == '+' \|\| s == '-')) s++; // sign
const char *first_non_digit = SkipDigits(s);		const char *first_non_digit = SkipDigits(s);
if (containsDigits(s, first_non_digit)) {		if (containsDigits(s, first_non_digit)) {
checkSeparator(TokLoc, s, CSK_BeforeDigits);		checkSeparator(TokLoc, s, CSK_BeforeDigits);
s = first_non_digit;		s = first_non_digit;
} else {		} else {
		if (!hadError) {
PP.Diag(PP.AdvanceToTokenCharacter(TokLoc, Exponent-ThisTokBegin),		PP.Diag(PP.AdvanceToTokenCharacter(TokLoc, Exponent-ThisTokBegin),
diag::err_exponent_has_no_digits);		diag::err_exponent_has_no_digits);
hadError = true;		hadError = true;
		}
return;		return;
}		}
}		}
}		}

/// Determine whether a suffix is a valid ud-suffix. We avoid treating reserved		/// Determine whether a suffix is a valid ud-suffix. We avoid treating reserved
/// suffixes as ud-suffixes, because the diagnostic experience is better if we		/// suffixes as ud-suffixes, because the diagnostic experience is better if we
/// treat it as an invalid suffix.		/// treat it as an invalid suffix.
Show All 24 Lines	void NumericLiteralParser::checkSeparator(SourceLocation TokLoc,
CheckSeparatorKind IsAfterDigits) {		CheckSeparatorKind IsAfterDigits) {
if (IsAfterDigits == CSK_AfterDigits) {		if (IsAfterDigits == CSK_AfterDigits) {
if (Pos == ThisTokBegin)		if (Pos == ThisTokBegin)
return;		return;
--Pos;		--Pos;
} else if (Pos == ThisTokEnd)		} else if (Pos == ThisTokEnd)
return;		return;

if (isDigitSeparator(*Pos))		if (isDigitSeparator(*Pos)) {
PP.Diag(PP.AdvanceToTokenCharacter(TokLoc, Pos - ThisTokBegin),		PP.Diag(PP.AdvanceToTokenCharacter(TokLoc, Pos - ThisTokBegin),
diag::err_digit_separator_not_between_digits)		diag::err_digit_separator_not_between_digits)
<< IsAfterDigits;		<< IsAfterDigits;
		hadError = true;
		}
}		}

/// ParseNumberStartingWithZero - This method is called when the first character		/// ParseNumberStartingWithZero - This method is called when the first character
/// of the number is found to be a zero. This means it is either an octal		/// of the number is found to be a zero. This means it is either an octal
/// number (like '04') or a hex number ('0x123a') a binary number ('0b1010') or		/// number (like '04') or a hex number ('0x123a') a binary number ('0b1010') or
/// a floating point number (01239.123e4). Eat the prefix, determining the		/// a floating point number (01239.123e4). Eat the prefix, determining the
/// radix etc.		/// radix etc.
void NumericLiteralParser::ParseNumberStartingWithZero(SourceLocation TokLoc) {		void NumericLiteralParser::ParseNumberStartingWithZero(SourceLocation TokLoc) {
Show All 33 Lines	if ((c1 == 'x' \|\| c1 == 'X') && (isHexDigit(s[1]) \|\| s[1] == '.')) {

// A binary exponent can appear with or with a '.'. If dotted, the		// A binary exponent can appear with or with a '.'. If dotted, the
// binary exponent is required.		// binary exponent is required.
if (s == 'p' \|\| s == 'P') {		if (s == 'p' \|\| s == 'P') {
checkSeparator(TokLoc, s, CSK_AfterDigits);		checkSeparator(TokLoc, s, CSK_AfterDigits);
const char *Exponent = s;		const char *Exponent = s;
s++;		s++;
saw_exponent = true;		saw_exponent = true;
if (s == '+' \|\| s == '-') s++; // sign		if (s != ThisTokEnd && (s == '+' \|\| s == '-')) s++; // sign
const char *first_non_digit = SkipDigits(s);		const char *first_non_digit = SkipDigits(s);
if (!containsDigits(s, first_non_digit)) {		if (!containsDigits(s, first_non_digit)) {
		if (!hadError) {
PP.Diag(PP.AdvanceToTokenCharacter(TokLoc, Exponent-ThisTokBegin),		PP.Diag(PP.AdvanceToTokenCharacter(TokLoc, Exponent-ThisTokBegin),
diag::err_exponent_has_no_digits);		diag::err_exponent_has_no_digits);
hadError = true;		hadError = true;
		}
return;		return;
}		}
checkSeparator(TokLoc, s, CSK_BeforeDigits);		checkSeparator(TokLoc, s, CSK_BeforeDigits);
s = first_non_digit;		s = first_non_digit;

if (!PP.getLangOpts().HexFloats)		if (!PP.getLangOpts().HexFloats)
PP.Diag(TokLoc, PP.getLangOpts().CPlusPlus		PP.Diag(TokLoc, PP.getLangOpts().CPlusPlus
? diag::ext_hex_literal_invalid		? diag::ext_hex_literal_invalid
▲ Show 20 Lines • Show All 872 Lines • Show Last 20 Lines

test/Lexer/cxx1y_digit_separators.cpp

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	namespace floating {
float p = 0'e1; // expected-error {{digit separator cannot appear at end of digit sequence}}		float p = 0'e1; // expected-error {{digit separator cannot appear at end of digit sequence}}
float q = 0'0e1;		float q = 0'0e1;
float r = 0.'0e1; // expected-error {{digit separator cannot appear at start of digit sequence}}		float r = 0.'0e1; // expected-error {{digit separator cannot appear at start of digit sequence}}
float s = 0.0'e1; // expected-error {{digit separator cannot appear at end of digit sequence}}		float s = 0.0'e1; // expected-error {{digit separator cannot appear at end of digit sequence}}
float t = 0.0e'1; // expected-error {{digit separator cannot appear at start of digit sequence}}		float t = 0.0e'1; // expected-error {{digit separator cannot appear at start of digit sequence}}
float u = 0x.'p1f; // expected-error {{hexadecimal floating literal requires a significand}}		float u = 0x.'p1f; // expected-error {{hexadecimal floating literal requires a significand}}
float v = 0e'f; // expected-error {{exponent has no digits}}		float v = 0e'f; // expected-error {{exponent has no digits}}
float w = 0x0p'f; // expected-error {{exponent has no digits}}		float w = 0x0p'f; // expected-error {{exponent has no digits}}
		float x = 0'e+1; // expected-error {{digit separator cannot appear at end of digit sequence}}
		float y = 0x0'p+1; // expected-error {{digit separator cannot appear at end of digit sequence}}
}		}

#line 123'456		#line 123'456
static_assert(__LINE__ == 123456, "");		static_assert(__LINE__ == 123456, "");

// x has value 0 in C++11 and 34 in C++1y.		// x has value 0 in C++11 and 34 in C++1y.
#define M(x, ...) __VA_ARGS__		#define M(x, ...) __VA_ARGS__
constexpr int x = { M(1'2,3'4) };		constexpr int x = { M(1'2,3'4) };
Show All 21 Lines