Download Raw Diff

Details

Reviewers

grosbach
andreadb
efriedma
rnk

Commits

rG3dd72ea810db: [MC] Fix floating-point literal lexing.
rL357214: [MC] Fix floating-point literal lexing.

Summary

Fix LexFloatLiteral Lexing to enforce the correct format before returning AsmToken::Real. It now reports an error on a wider range of invalid inputs. See test update for details.

Diff Detail

Repository: rL LLVM

Event Timeline

BrandonTJones created this revision.Jan 28 2019, 4:41 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald TranscriptJan 28 2019, 4:41 AM

I fixed the -U modifier on my diff as my diff was incorrect

I don't think I am the right person to review this?

In D57321#1373767, @clayborg wrote:

I don't think I am the right person to review this?

This is correct, my apologies

BrandonTJones edited reviewers, added: grosbach; removed: clayborg.Jan 29 2019, 2:02 AM

BrandonTJones added a reviewer: andreadb.Jan 29 2019, 3:14 AM

efriedma added a subscriber: efriedma.Jan 30 2019, 3:19 PM

efriedma added inline comments.

llvm/lib/MC/MCParser/AsmLexer.cpp
69	Spelling
76	The comment here seems to indicate the current behavior of LexFloatLiteral is intentional, and the caller (e.g. AsmParser::parseRealValue or AsmParser::parsePrimaryExpr) should handle ill-formed floats with a more specific error message. Do you think that design is wrong? If you do think that design is wrong, please update the comments to describe what you think should happen instead.

efriedma added reviewers: efriedma, rnk.Jan 30 2019, 3:19 PM

I have fixed a spelling error and updated a comment to better reflect the change made.

BrandonTJones marked 2 inline comments as done.Feb 4 2019, 5:07 AM

efriedma added inline comments.Feb 4 2019, 4:19 PM

llvm/lib/MC/MCParser/AsmLexer.cpp
64	Probably should also fix this comment.
152	'e' and 'E' are identifier characters, so some of the checks here are redundant.
153	`CurPtr == 'e' && CurPtr == 'E'` is impossible. We clearly need more test coverage given this issue wasn't caught by tests.
338	It's hard to follow this logic; when it's tangled together like this; does this accept `1.+1`? Need more test coverage to catch cases like this.
343	If we conclude the suffix doesn't qualify as a float, we apparently treat it the suffix as an identifier; is that right? Are the resulting diagnostics really going to be understandable? (I guess "unexpected token in '.double' directive" is okay, although not great.) Should we worry about binutils compatibility at all? It apparently treats `1.e` as equivalent to `1.e0`.

BrandonTJones marked an inline comment as done.Feb 6 2019, 5:18 AM

BrandonTJones added inline comments.

llvm/lib/MC/MCParser/AsmLexer.cpp
343	I think the diagnostics should be okay. For binutils compat, does it treat no exponent as "0" always, or only in the case of <digits>.e. There seem to be tests in place that expect the program to die in response to these cases instead of handling them.

BrandonTJones marked an inline comment as not done.Feb 6 2019, 5:39 AM

For binutils compat, does it treat no exponent as "0" always, or only in the case of <digits>.e.
There seem to be tests in place that expect the program to die in response to these cases instead of handling them.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2019, 1:48 AM

Refined tests. Added binutils compat.

Herald added subscribers: kristina, dexonsmith. · View Herald TranscriptFeb 18 2019, 2:10 AM

BrandonTJones marked 5 inline comments as done.Feb 18 2019, 2:11 AM

BrandonTJones marked an inline comment as done.

efriedma added inline comments.Feb 20 2019, 4:22 PM

llvm/include/llvm/MC/MCParser/AsmLexer.h
65	Spelling ("separated").
llvm/lib/MC/MCParser/AsmLexer.cpp
74	Is this early return necessary, or just to try to improve the error messages?
341	Instead of adding a boolean parameter to LexFloatLiteral, can we make the "++CurPtr" conditional? It's easier to follow the logic if CurPtr is always before the "E" when LexFloatLiteral is called.

Removed boolean param

BrandonTJones marked an inline comment as done.Feb 21 2019, 5:50 AM

BrandonTJones added inline comments.

llvm/lib/MC/MCParser/AsmLexer.cpp
74	Improve the error message, it feels ambiguous to allow the error message to be from the parser not expecting 2 floats in a row.

So I guess overall, there are three fixes here:

Make AsmLexer::LexDigit handle floats without a decimal point more consistently.
Make AsmLexer::LexFloatLiteral print an error for floats which are apparently missing an "e".
Make APFloat::convertFromString use binutils-compatible exponent parsing.

Is that right?

llvm/lib/MC/MCParser/AsmLexer.cpp
82	Maybe update this comment?
151	This change doesn't do anything?
llvm/test/MC/AsmParser/floating-literals.s
62	We should probably have testcases for 1E1, 1e1e1, and 1e-1, since those don't work correctly without this patch.
llvm/unittests/ADT/APFloatTest.cpp
1196 ↗	(On Diff #187776)	We should probably keep these testcases, just change them to check for the new behavior (using ASSERT_EQ).

Added more complete test coverage.

BrandonTJones marked 5 inline comments as done.Feb 25 2019, 3:19 AM

BrandonTJones added inline comments.

llvm/lib/MC/MCParser/AsmLexer.cpp
151	This change arised from a merge conflict in my local repo, the new diff is more in keeping with the order from before the patch originally which is why I have kept it

BrandonTJones marked an inline comment as done.Feb 27 2019, 1:48 AM

ping

dexonsmith removed a subscriber: dexonsmith.Mar 13 2019, 12:26 PM

efriedma added inline comments.Mar 15 2019, 2:38 PM

llvm/test/MC/AsmParser/floating-literals.s
62	In this context, 1E1 is different from 1e1... Probably best to check all of these with lowercase and uppercase "E".

efriedma added inline comments.Mar 15 2019, 2:39 PM

llvm/test/MC/AsmParser/floating-literals.s
62	Err, just realized my comment "1E1 is different from 1e1" might be unclear. They should be treated the same way, but LLVM without this patch treats them differently, so we should have test coverage.

Added a new test

Just to be clear, should I add a capital 'E' version of all the new tests I have added with this patch, or just the one mentioned?

It would be better to add a capital E version of all the relevant tests in the file, I think.

Added test cases for 'E' cases.

BrandonTJones marked 2 inline comments as done.Mar 26 2019, 7:24 AM

LGTM. Thanks for sticking with this for so many rounds of review.

This revision is now accepted and ready to land.Mar 27 2019, 2:01 PM

Not a problem at all. Feel free to commit this on my behalf as I do not have permissions. Thanks!

Closed by commit rL357214: [MC] Fix floating-point literal lexing. (authored by efriedma). · Explain WhyMar 28 2019, 2:13 PM

This revision was automatically updated to reflect the committed changes.

Diff 183829

llvm/include/llvm/MC/MCParser/AsmLexer.h

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	private:
AsmToken ReturnError(const char *Loc, const std::string &Msg);		AsmToken ReturnError(const char *Loc, const std::string &Msg);

AsmToken LexIdentifier();		AsmToken LexIdentifier();
AsmToken LexSlash();		AsmToken LexSlash();
AsmToken LexLineComment();		AsmToken LexLineComment();
AsmToken LexDigit();		AsmToken LexDigit();
AsmToken LexSingleQuote();		AsmToken LexSingleQuote();
AsmToken LexQuote();		AsmToken LexQuote();
AsmToken LexFloatLiteral();		AsmToken LexFloatLiteral(bool isDotSeperated);
		efriedmaUnsubmitted Done Reply Inline Actions Spelling ("separated"). efriedma: Spelling ("separated").
AsmToken LexHexFloatLiteral(bool NoIntDigits);		AsmToken LexHexFloatLiteral(bool NoIntDigits);

StringRef LexUntilEndOfLine();		StringRef LexUntilEndOfLine();
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_MC_MCPARSER_ASMLEXER_H		#endif // LLVM_MC_MCPARSER_ASMLEXER_H

llvm/lib/MC/MCParser/AsmLexer.cpp

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
}		}

int AsmLexer::getNextChar() {		int AsmLexer::getNextChar() {
if (CurPtr == CurBuf.end())		if (CurPtr == CurBuf.end())
return EOF;		return EOF;
return (unsigned char)*CurPtr++;		return (unsigned char)*CurPtr++;
}		}

/// LexFloatLiteral: [0-9][.][0-9]([eE][+-]?[0-9]*)?		/// LexFloatLiteral: [0-9][.][0-9]([eE][+-]?[0-9]*)?
		efriedmaUnsubmitted Done Reply Inline Actions Probably should also fix this comment. efriedma: Probably should also fix this comment.
///		///
/// The leading integral digit sequence and dot should have already been		/// The leading integral digit sequence and dot should have already been
/// consumed, some or all of the fractional digit sequence can have been		/// consumed, some or all of the fractional digit sequence can have been
/// consumed.		/// consumed.
AsmToken AsmLexer::LexFloatLiteral() {		AsmToken AsmLexer::LexFloatLiteral(bool isDotSeperated) {
		efriedmaUnsubmitted Done Reply Inline Actions Spelling efriedma: Spelling
// Skip the fractional digit sequence.		// Skip the fractional digit sequence.
while (isDigit(*CurPtr))		while (isDigit(*CurPtr))
++CurPtr;		++CurPtr;

// Check for exponent; we intentionally accept a slighlty wider set of		// Check for exponent; we intentionally accept a slighlty wider set of
		efriedmaUnsubmitted Done Reply Inline Actions Is this early return necessary, or just to try to improve the error messages? efriedma: Is this early return necessary, or just to try to improve the error messages?
		BrandonTJonesAuthorUnsubmitted Done Reply Inline Actions Improve the error message, it feels ambiguous to allow the error message to be from the parser not expecting 2 floats in a row. BrandonTJones: Improve the error message, it feels ambiguous to allow the error message to be from the parser…
// literals here and rely on the upstream client to reject invalid ones (e.g.,		// literals here and rely on the upstream client to reject invalid ones (e.g.,
// "1e+").		// "1e+").
		efriedmaUnsubmitted Done Reply Inline Actions The comment here seems to indicate the current behavior of LexFloatLiteral is intentional, and the caller (e.g. AsmParser::parseRealValue or AsmParser::parsePrimaryExpr) should handle ill-formed floats with a more specific error message. Do you think that design is wrong? If you do think that design is wrong, please update the comments to describe what you think should happen instead. efriedma: The comment here seems to indicate the current behavior of LexFloatLiteral is intentional, and…
if (CurPtr == 'e' \|\| CurPtr == 'E') {		if (isDotSeperated && (CurPtr == 'e' \|\| CurPtr == 'E') &&
		(isDigit(CurPtr[1]) \|\|
		((CurPtr[1] == '-' \|\| CurPtr[1] == '+') && isDigit(CurPtr[2])))) {
++CurPtr;		++CurPtr;
if (CurPtr == '-' \|\| CurPtr == '+')		if ((CurPtr == '-' \|\| CurPtr == '+') && isDigit(CurPtr[1]))
++CurPtr;		++CurPtr;
		efriedmaUnsubmitted Done Reply Inline Actions Maybe update this comment? efriedma: Maybe update this comment?
while (isDigit(*CurPtr))		while (isDigit(*CurPtr))
++CurPtr;		++CurPtr;
}		}

return AsmToken(AsmToken::Real,		return AsmToken(AsmToken::Real,
StringRef(TokStart, CurPtr - TokStart));		StringRef(TokStart, CurPtr - TokStart));
}		}

▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
}		}

AsmToken AsmLexer::LexIdentifier() {		AsmToken AsmLexer::LexIdentifier() {
// Check for floating point literals.		// Check for floating point literals.
if (CurPtr[-1] == '.' && isDigit(*CurPtr)) {		if (CurPtr[-1] == '.' && isDigit(*CurPtr)) {
// Disambiguate a .1243foo identifier from a floating literal.		// Disambiguate a .1243foo identifier from a floating literal.
while (isDigit(*CurPtr))		while (isDigit(*CurPtr))
++CurPtr;		++CurPtr;
if (CurPtr == 'e' \|\| CurPtr == 'E' \|\|		if ((!IsIdentifierChar(*CurPtr, AllowAtInIdentifier) &&
!IsIdentifierChar(*CurPtr, AllowAtInIdentifier))		(CurPtr != 'e' && CurPtr != 'E')) \|\|
		efriedmaUnsubmitted Done Reply Inline Actions This change doesn't do anything? efriedma: This change doesn't do anything?
		BrandonTJonesAuthorUnsubmitted Done Reply Inline Actions This change arised from a merge conflict in my local repo, the new diff is more in keeping with the order from before the patch originally which is why I have kept it BrandonTJones: This change arised from a merge conflict in my local repo, the new diff is more in keeping with…
return LexFloatLiteral();		(!IsIdentifierChar(*CurPtr, AllowAtInIdentifier) &&
		efriedmaUnsubmitted Done Reply Inline Actions 'e' and 'E' are identifier characters, so some of the checks here are redundant. efriedma: 'e' and 'E' are identifier characters, so some of the checks here are redundant.
		(CurPtr == 'e' && CurPtr == 'E') &&
		efriedmaUnsubmitted Done Reply Inline Actions `CurPtr == 'e' && CurPtr == 'E'` is impossible. We clearly need more test coverage given this issue wasn't caught by tests. efriedma: `CurPtr == 'e' && CurPtr == 'E'` is impossible. We clearly need more test coverage given…
		(isDigit(CurPtr[1]) \|\|
		((CurPtr[1] == '-' \|\| CurPtr[1] == '+') && isDigit(CurPtr[2])))))
		return LexFloatLiteral(false);
}		}

while (IsIdentifierChar(*CurPtr, AllowAtInIdentifier))		while (IsIdentifierChar(*CurPtr, AllowAtInIdentifier))
++CurPtr;		++CurPtr;

// Handle . as a special case.		// Handle . as a special case.
if (CurPtr == TokStart+1 && TokStart[0] == '.')		if (CurPtr == TokStart+1 && TokStart[0] == '.')
return AsmToken(AsmToken::Dot, StringRef(TokStart, 1));		return AsmToken(AsmToken::Dot, StringRef(TokStart, 1));
▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines	if (LexMasmIntegers && isdigit(CurPtr[-1])) {
CurPtr = OldCurPtr;		CurPtr = OldCurPtr;
}		}

// Decimal integer: [1-9][0-9]*		// Decimal integer: [1-9][0-9]*
if (CurPtr[-1] != '0' \|\| CurPtr[0] == '.') {		if (CurPtr[-1] != '0' \|\| CurPtr[0] == '.') {
unsigned Radix = doHexLookAhead(CurPtr, 10, LexMasmIntegers);		unsigned Radix = doHexLookAhead(CurPtr, 10, LexMasmIntegers);
bool isHex = Radix == 16;		bool isHex = Radix == 16;
// Check for floating point literals.		// Check for floating point literals.
if (!isHex && (CurPtr == '.' \|\| CurPtr == 'e')) {		if ((!isHex && (CurPtr == '.' \|\| CurPtr == 'e')) &&
++CurPtr;		(isDigit(CurPtr[1]) \|\| ((CurPtr[1] == '-' \|\| CurPtr[1] == '+' \|\|
return LexFloatLiteral();		(*CurPtr == '.' && CurPtr[1] == 'e')) &&
		isDigit(CurPtr[2])))) {
		efriedmaUnsubmitted Done Reply Inline Actions It's hard to follow this logic; when it's tangled together like this; does this accept `1.+1`? Need more test coverage to catch cases like this. efriedma: It's hard to follow this logic; when it's tangled together like this; does this accept `1.+1`?
		++CurPtr;
		if (CurPtr[-1] == '.')
		return LexFloatLiteral(true);
		efriedmaUnsubmitted Done Reply Inline Actions Instead of adding a boolean parameter to LexFloatLiteral, can we make the "++CurPtr" conditional? It's easier to follow the logic if CurPtr is always before the "E" when LexFloatLiteral is called. efriedma: Instead of adding a boolean parameter to LexFloatLiteral, can we make the "++CurPtr"…
		return LexFloatLiteral(false);
}		}
		efriedmaUnsubmitted Done Reply Inline Actions If we conclude the suffix doesn't qualify as a float, we apparently treat it the suffix as an identifier; is that right? Are the resulting diagnostics really going to be understandable? (I guess "unexpected token in '.double' directive" is okay, although not great.) Should we worry about binutils compatibility at all? It apparently treats `1.e` as equivalent to `1.e0`. efriedma: If we conclude the suffix doesn't qualify as a float, we apparently treat it the suffix as an…
		BrandonTJonesAuthorUnsubmitted Done Reply Inline Actions I think the diagnostics should be okay. For binutils compat, does it treat no exponent as "0" always, or only in the case of <digits>.e. There seem to be tests in place that expect the program to die in response to these cases instead of handling them. BrandonTJones: I think the diagnostics should be okay. For binutils compat, does it treat no exponent as "0"…

StringRef Result(TokStart, CurPtr - TokStart);		StringRef Result(TokStart, CurPtr - TokStart);

APInt Value(128, 0, true);		APInt Value(128, 0, true);
if (Result.getAsInteger(Radix, Value))		if (Result.getAsInteger(Radix, Value))
return ReturnError(TokStart, !isHex ? "invalid decimal number" :		return ReturnError(TokStart, !isHex ? "invalid decimal number" :
"invalid hexdecimal number");		"invalid hexdecimal number");

▲ Show 20 Lines • Show All 412 Lines • Show Last 20 Lines

llvm/test/MC/AsmParser/floating-literals.s

	Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	# CHECK: .quad 4681608360884174848			# CHECK: .quad 4681608360884174848
	.double 1e5			.double 1e5
	# CHECK: .quad 4681608360884174848			# CHECK: .quad 4681608360884174848
	.double 1.e5			.double 1.e5
	# CHECK: .quad 4611686018427387904			# CHECK: .quad 4611686018427387904
	.double 2.			.double 2.

	// APFloat should reject these with an error, not crash:			// APFloat should reject these with an error, not crash:
	//.double -1.2e+
	//.double -1.2e			#CHECK-ERROR: unexpected token in '.double' directive
				.double -1.2e+
				#CHECK-ERROR: unexpected token in '.double' directive
				.double -1.2e

	# CHECK: .long 1310177520			# CHECK: .long 1310177520
	.float 0x12f7.1ep+17			.float 0x12f7.1ep+17
	# CHECK: .long 1084227584			# CHECK: .long 1084227584
	.float 0x.ap+3			.float 0x.ap+3
				efriedmaUnsubmitted Done Reply Inline Actions We should probably have testcases for 1E1, 1e1e1, and 1e-1, since those don't work correctly without this patch. efriedma: We should probably have testcases for 1E1, 1e1e1, and 1e-1, since those don't work correctly…
				efriedmaUnsubmitted Done Reply Inline Actions In this context, 1E1 is different from 1e1... Probably best to check all of these with lowercase and uppercase "E". efriedma: In this context, 1E1 is different from 1e1... Probably best to check all of these with…
				efriedmaUnsubmitted Done Reply Inline Actions Err, just realized my comment "1E1 is different from 1e1" might be unclear. They should be treated the same way, but LLVM without this patch treats them differently, so we should have test coverage. efriedma: Err, just realized my comment "1E1 is different from 1e1" might be unclear. They should be…
	# CHECK: .quad 4602678819172646912			# CHECK: .quad 4602678819172646912
	.double 0x2.p-2			.double 0x2.p-2
	# CHECK: .long 1094713344			# CHECK: .long 1094713344
	.float 0x3p2			.float 0x3p2
	# CHECK: .long 872284160			# CHECK: .long 872284160
	.float 0x7fp-30			.float 0x7fp-30
	# CHECK: .long 3212836864			# CHECK: .long 3212836864
	.float -0x1.0p0			.float -0x1.0p0
	Show All 18 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Fix LexFloatLiteral Lexing
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 183829

llvm/include/llvm/MC/MCParser/AsmLexer.h

llvm/lib/MC/MCParser/AsmLexer.cpp

llvm/test/MC/AsmParser/floating-literals.s

This is an archive of the discontinued LLVM Phabricator instance.

Fix LexFloatLiteral LexingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 183829

llvm/include/llvm/MC/MCParser/AsmLexer.h

llvm/lib/MC/MCParser/AsmLexer.cpp

llvm/test/MC/AsmParser/floating-literals.s

Fix LexFloatLiteral Lexing
ClosedPublic