This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Parser/
-
Parser/
6
Parser.cpp
-
test/IR/
-
IR/
-
invalid.mlir

Differential D125353

[AsmParser] Improve error recovery again.
ClosedPublic

Authored by lattner on May 11 2022, 12:25 AM.

Download Raw Diff

Details

Reviewers

rriddle
lattner
bzcheeseman

Commits

rG34b6f206cbab: [AsmParser] Improve error recovery again.

Summary

Change the parsing logic to use StringRef instead of lower level
char* logic. Also, if emitting a diagnostic on the first token
in the file, we make sure to use that position instead of the
very start of the file.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lattner created this revision.May 11 2022, 12:25 AM

Herald added a reviewer: rriddle. · View Herald TranscriptMay 11 2022, 12:25 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: sdasgup3, wenzhicui, wrengr and 19 others. · View Herald Transcript

lattner requested review of this revision.May 11 2022, 12:25 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2022, 12:25 AM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

lattner accepted this revision.May 11 2022, 12:25 AM

This revision is now accepted and ready to land.May 11 2022, 12:25 AM

This revision was landed with ongoing or failed builds.May 11 2022, 12:25 AM

Closed by commit rG34b6f206cbab: [AsmParser] Improve error recovery again. (authored by lattner). · Explain Why

This revision was automatically updated to reflect the committed changes.

lattner added a commit: rG34b6f206cbab: [AsmParser] Improve error recovery again..

lattner added a reviewer: bzcheeseman.May 11 2022, 12:52 AM

Harbormaster completed remote builds in B163840: Diff 428573.May 11 2022, 2:58 AM

bzcheeseman added inline comments.May 11 2022, 8:35 AM

mlir/lib/Parser/Parser.cpp
202	StringRef eol = startOfBuffer.detectEOL(); if (!startOfBuffer.endswith(eol)) ?
212	size_t newLineIndex = prefLine.find_last_of(eol); // eol variable from above ?
217	could you use `rsplit` on this? It'd just return a pair of stringrefs and then `startOfBuffer` is just the first (before the comment token).

LG with @bzcheeseman s comments addressed.

lattner added inline comments.May 11 2022, 12:51 PM

mlir/lib/Parser/Parser.cpp
202	detectEOL is... interesting, but massive overkill for what we're doing. I think the existing code is simple and good.
212	Likewise, it is fine to remove \n\r in any combination, and is simpler.
217	I investigated, but not really. rsplit is the wrong thing for lines that contain multiple //'s. We need split from the start of the line. I just tried `startOfBuffer = startOfBuffer.split("//").first;` but that is of course wrong, as is prevLine.split(...) without weird gymnastics. I think the existing code is a reasonable solution here.

Thanks for looking into it - if this is simpler than the other thing then by all means let's keep it simple!

Revision Contents

Path

Size

mlir/

lib/

Parser/

Parser.cpp

32 lines

test/

IR/

invalid.mlir

16 lines

Diff 428575

mlir/lib/Parser/Parser.cpp

	Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines
	/// token is supposed to be.			/// token is supposed to be.
	InFlightDiagnostic Parser::emitWrongTokenError(const Twine &message) {			InFlightDiagnostic Parser::emitWrongTokenError(const Twine &message) {
	auto loc = state.curToken.getLoc();			auto loc = state.curToken.getLoc();

	// If the error is to be emitted at EOF, move it back one character.			// If the error is to be emitted at EOF, move it back one character.
	if (state.curToken.is(Token::eof))			if (state.curToken.is(Token::eof))
	loc = SMLoc::getFromPointer(loc.getPointer() - 1);			loc = SMLoc::getFromPointer(loc.getPointer() - 1);

				// This is the location we were originally asked to report the error at.
				auto originalLoc = loc;

	// Determine if the token is at the start of the current line.			// Determine if the token is at the start of the current line.
	const char *bufferStart = state.lex.getBufferBegin();			const char *bufferStart = state.lex.getBufferBegin();
	const char *curPtr = loc.getPointer();			const char *curPtr = loc.getPointer();

				// Use this StringRef to keep track of what we are going to back up through,
				// it provides nicer string search functions etc.
				StringRef startOfBuffer(bufferStart, curPtr - bufferStart);

	// Back up over entirely blank lines.			// Back up over entirely blank lines.
	while (1) {			while (1) {
	// Back up until we see a \n, but don't look past the buffer start.			// Back up until we see a \n, but don't look past the buffer start.
	curPtr = StringRef(bufferStart, curPtr - bufferStart).rtrim(" \t").end();			startOfBuffer = startOfBuffer.rtrim(" \t");

	// For tokens with no preceding source line, just emit at the original			// For tokens with no preceding source line, just emit at the original
	// location.			// location.
	if (curPtr == bufferStart \|\| curPtr[-1] != '\n')			if (startOfBuffer.empty())
	return emitError(loc, message);			return emitError(originalLoc, message);

				// If we found something that isn't the end of line, then we're done.
				if (startOfBuffer.back() != '\n' && startOfBuffer.back() != '\r')
				bzcheesemanUnsubmitted Not Done Reply Inline Actions StringRef eol = startOfBuffer.detectEOL(); if (!startOfBuffer.endswith(eol)) ? bzcheeseman: ``` StringRef eol = startOfBuffer.detectEOL(); if (!startOfBuffer.endswith(eol)) ``` ?
				lattnerAuthorUnsubmitted Not Done Reply Inline Actions detectEOL is... interesting, but massive overkill for what we're doing. I think the existing code is simple and good. lattner: detectEOL is... interesting, but massive overkill for what we're doing. I think the existing…
				return emitError(SMLoc::getFromPointer(startOfBuffer.end()), message);

				// Drop the \n so we emit the diagnostic at the end of the line.
				startOfBuffer = startOfBuffer.drop_back();

	// Check to see if the preceding line has a comment on it. We assume that a			// Check to see if the preceding line has a comment on it. We assume that a
	// `//` is the start of a comment, which is mostly correct.			// `//` is the start of a comment, which is mostly correct.
	// TODO: This will do the wrong thing for // in a string literal.			// TODO: This will do the wrong thing for // in a string literal.
	--curPtr;			auto prevLine = startOfBuffer;
	auto prevLine = StringRef(bufferStart, curPtr - bufferStart);			size_t newLineIndex = prevLine.find_last_of("\n\r");
				bzcheesemanUnsubmitted Not Done Reply Inline Actions size_t newLineIndex = prefLine.find_last_of(eol); // eol variable from above ? bzcheeseman: ``` size_t newLineIndex = prefLine.find_last_of(eol); // eol variable from above ``` ?
				lattnerAuthorUnsubmitted Not Done Reply Inline Actions Likewise, it is fine to remove \n\r in any combination, and is simpler. lattner: Likewise, it is fine to remove \n\r in any combination, and is simpler.
	size_t newLineIndex = prevLine.rfind('\n');
	if (newLineIndex != StringRef::npos)			if (newLineIndex != StringRef::npos)
	prevLine = prevLine.drop_front(newLineIndex);			prevLine = prevLine.drop_front(newLineIndex);

				// If we find a // in the current line, then emit the diagnostic before it.
	size_t commentStart = prevLine.find("//");			size_t commentStart = prevLine.find("//");
				bzcheesemanUnsubmitted Not Done Reply Inline Actions could you use `rsplit` on this? It'd just return a pair of stringrefs and then `startOfBuffer` is just the first (before the comment token). bzcheeseman: could you use `rsplit` on this? It'd just return a pair of stringrefs and then `startOfBuffer`…
				lattnerAuthorUnsubmitted Not Done Reply Inline Actions I investigated, but not really. rsplit is the wrong thing for lines that contain multiple //'s. We need split from the start of the line. I just tried `startOfBuffer = startOfBuffer.split("//").first;` but that is of course wrong, as is prevLine.split(...) without weird gymnastics. I think the existing code is a reasonable solution here. lattner: I investigated, but not really. rsplit is the wrong thing for lines that contain multiple //'s.
	if (commentStart != StringRef::npos)			if (commentStart != StringRef::npos)
	curPtr = prevLine.begin() + commentStart;			startOfBuffer = startOfBuffer.drop_back(prevLine.size() - commentStart);

	// Otherwise, we can move backwards at least this line.
	loc = SMLoc::getFromPointer(curPtr);
	}			}
	}			}

	/// Consume the specified token if present and return success. On failure,			/// Consume the specified token if present and return success. On failure,
	/// output a diagnostic and return failure.			/// output a diagnostic and return failure.
	ParseResult Parser::parseToken(Token::Kind expectedToken,			ParseResult Parser::parseToken(Token::Kind expectedToken,
	const Twine &message) {			const Twine &message) {
	if (consumeIf(expectedToken))			if (consumeIf(expectedToken))
	▲ Show 20 Lines • Show All 2,027 Lines • Show Last 20 Lines

mlir/test/IR/invalid.mlir

	Show First 20 Lines • Show All 1,669 Lines • ▼ Show 20 Lines
	// expected at the end of foo, not on the return line.			// expected at the end of foo, not on the return line.
	func.func @error_at_end_of_line() {			func.func @error_at_end_of_line() {
	%0 = "foo"()			%0 = "foo"()
	// expected-error@-1 {{expected ':' followed by operation type}}			// expected-error@-1 {{expected ':' followed by operation type}}

	// This is a comment and so is the thing above.			// This is a comment and so is the thing above.
	return			return
	}			}

				// -----

				// This makes sure we emit an error at the end of the correct line, the : is
				// expected at the end of foo, not on the return line.
				// This shows that it backs up to before the comment.
				func.func @error_at_end_of_line() {
				%0 = "foo"() // expected-error {{expected ':' followed by operation type}}
				return
				}

				// -----

				@foo // expected-error {{expected operation name in quotes}}