This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
.gitattributes
-
lib/MC/MCParser/
-
MC/
-
MCParser/
-
AsmLexer.cpp
-
test/tools/llvm-mca/
-
tools/
-
llvm-mca/
-
directives-handle-crlf.s

Differential D90234

[MCParser] Correctly handle Windows line-endings when consuming lexed line comments
ClosedPublic

Authored by StephenTozer on Oct 27 2020, 8:09 AM.

Download Raw Diff

Details

Reviewers

lattner
caoz
olista01
grosbach
andreadb

Commits

rG5c6f748cbc17: [MCParser] Correctly handle CRLF line ends when consuming line comments

Summary

Fixes issue: https://bugs.llvm.org/show_bug.cgi?id=47983

The AsmLexer has a function LexLineComment that, as part of the lexing, passes the contents of the comment to a CommentConsumer if one exists. The passed comment is meant to exclude newline characters, but it does this by taking the range from the start of the comment inclusive to the last newline exclusive; this works with Unix line-endings, which are a single character, but fails when used with Windows line-endings, in which case the carriage return will be included as part of the passed comment. This causes an issue with llvm-mca, as it reads directives which have no label as directives with the label \r, but may result in inconsistent behaviour for any consumer when switching between line ending styles.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

StephenTozer created this revision.Oct 27 2020, 8:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 27 2020, 8:09 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

StephenTozer requested review of this revision.Oct 27 2020, 8:09 AM

Harbormaster completed remote builds in B76560: Diff 300986.Oct 27 2020, 8:16 AM

test case?

I've added a test for the symptom that revealed this bug in llvm-mca. I'm also writing a unit test for AsmLexer that tests the underlying behaviour by verifying that CommentConsumers will not be sent characters that are not part of the line comment, since the problem is not specific to llvm-mca (although it's the only place that has seen an error so far, as far as I can tell).

Herald added a reviewer: andreadb. · View Herald TranscriptOct 28 2020, 9:26 AM

Herald added subscribers: jdoerfert, gbedwell. · View Herald Transcript

Harbormaster completed remote builds in B76741: Diff 301305.Oct 28 2020, 9:27 AM

Ping - there may be room for more testing further down the line, but upon reflection I think the current test is sufficient for this patch.

As the test needs to have its CRLF line endings preserved, it needs to be checked into git as a binary file - unfortunately, this means that the contents of the file are not uploaded to phabricator. Instead, I'll paste the contents of the test file here. For context, the file is created exactly from the reproducer in the buzilla report above.

Edit: Reduced test file contents.

	# RUN: llvm-mca %s
	# LLVM-MCA-BEGIN foo
	addl	$42, %eax
	# LLVM-MCA-END

Added text diff of test file to review.

Harbormaster completed remote builds in B119889: Diff 366882.Aug 17 2021, 6:32 AM

Sounds reasonable to me.

This revision is now accepted and ready to land.Aug 17 2021, 7:29 AM

This revision was landed with ongoing or failed builds.Aug 17 2021, 7:53 AM

Closed by commit rG5c6f748cbc17: [MCParser] Correctly handle CRLF line ends when consuming line comments (authored by StephenTozer). · Explain Why

This revision was automatically updated to reflect the committed changes.

StephenTozer added a commit: rG5c6f748cbc17: [MCParser] Correctly handle CRLF line ends when consuming line comments.

AsmCommentConsumer::HandleComment() is documented as excluding "the newline for single-line comments" which I think is reasonable to interpret as excluding the entire CRLF. So, the code change LGTM.

The only AsmCommentConsumer providers that I see in-tree are in llvm-mca and llvm-exegesis; the latter avoids the problem because it calls StringRef::trim() on the provided comment as the first thing it does in its HandleComment() override. It looks like llvm-mca doesn't do that, so the validity of the test is actually dependent on llvm-mca being less deft with its string handling than llvm-exegesis. That doesn't seem especially robust.

Really what you want here is a unittest that exercises the HandleComment API. The only existing asm lexer test I see is llvm/unittests/MC/SystemZ/SystemZAsmLexerTest.cpp which you might be able to leverage; really you want a target-independent test, though, and the SystemZ test infrastructure does a bunch of things you don't need.

I see @andreadb already approved, so please take this as a follow-up.

Not quite sure why, but this breaks tests on my m1 bot: http://45.33.8.238/macm1/16248/step_11.txt

Probably the -mtriple=x86_64 means this needs a REQUIRES: line -- or the test should move into llvm/test/tools/llvm-mca/X86?

Please take a look :)

In D90234#2949482, @thakis wrote:

Not quite sure why, but this breaks tests on my m1 bot: http://45.33.8.238/macm1/16248/step_11.txt

Probably the -mtriple=x86_64 means this needs a REQUIRES: line -- or the test should move into llvm/test/tools/llvm-mca/X86?

Please take a look :)

Indeed, my mistake - I saw that error and have pushed a fix up for it (moving the test into X86).

Seeing some further issues on buildbots due to the absence of a -mcpu flag passed into the command, fixing again.

In D90234#2949630, @StephenTozer wrote:

Seeing some further issues on buildbots due to the absence of a -mcpu flag passed into the command, fixing again.

Happy now. Thanks for the fix!

Revision Contents

Path

Size

llvm/

.gitattributes

3 lines

lib/

MC/

MCParser/

AsmLexer.cpp

3 lines

test/

tools/

llvm-mca/

directives-handle-crlf.s

4 lines

Diff 366898

llvm/.gitattributes

	# binary files			# binary files
	test/Object/Inputs/.a- binary			test/Object/Inputs/.a- binary
	test/tools/dsymutil/Inputs/*.o binary			test/tools/dsymutil/Inputs/*.o binary
	test/tools/dsymutil/Inputs/*.a binary			test/tools/dsymutil/Inputs/*.a binary
	test/tools/dsymutil/Inputs/*.i386 binary			test/tools/dsymutil/Inputs/*.i386 binary
	test/tools/dsymutil/Inputs/*.x86_64 binary			test/tools/dsymutil/Inputs/*.x86_64 binary
	test/tools/dsymutil/Inputs/*.armv7m binary			test/tools/dsymutil/Inputs/*.armv7m binary
	test/tools/dsymutil/Inputs/*.dylib binary			test/tools/dsymutil/Inputs/*.dylib binary
	test/tools/llvm-ar/Inputs/*.lib binary			test/tools/llvm-ar/Inputs/*.lib binary
	test/tools/llvm-objdump/Inputs/*.a binary			test/tools/llvm-objdump/Inputs/*.a binary
	test/tools/llvm-rc/Inputs/* binary			test/tools/llvm-rc/Inputs/* binary
	test/tools/llvm-strings/Inputs/numbers binary			test/tools/llvm-strings/Inputs/numbers binary
	test/MC/AsmParser/incbin_abcd binary			test/MC/AsmParser/incbin_abcd binary
	test/YAMLParser/spec-09-02.test binary			test/YAMLParser/spec-09-02.test binary

	# This file must have CRLF line endings, therefore git should treat it as			# These files must have CRLF line endings, therefore git should treat them as
	# binary and not autoconvert line endings (for example, when core.autocrlf is			# binary and not autoconvert line endings (for example, when core.autocrlf is
	# on).			# on).
	test/MC/AsmParser/preserve-comments-crlf.s binary			test/MC/AsmParser/preserve-comments-crlf.s binary
				test/tools/llvm-mca/directives-handle-crlf.s binary

llvm/lib/MC/MCParser/AsmLexer.cpp

Show First 20 Lines • Show All 222 Lines • ▼ Show 20 Lines	AsmToken AsmLexer::LexLineComment() {
// Mark This as an end of statement with a body of the		// Mark This as an end of statement with a body of the
// comment. While it would be nicer to leave this two tokens,		// comment. While it would be nicer to leave this two tokens,
// backwards compatability with TargetParsers makes keeping this in this form		// backwards compatability with TargetParsers makes keeping this in this form
// better.		// better.
const char *CommentTextStart = CurPtr;		const char *CommentTextStart = CurPtr;
int CurChar = getNextChar();		int CurChar = getNextChar();
while (CurChar != '\n' && CurChar != '\r' && CurChar != EOF)		while (CurChar != '\n' && CurChar != '\r' && CurChar != EOF)
CurChar = getNextChar();		CurChar = getNextChar();
		const char *NewlinePtr = CurPtr;
if (CurChar == '\r' && CurPtr != CurBuf.end() && *CurPtr == '\n')		if (CurChar == '\r' && CurPtr != CurBuf.end() && *CurPtr == '\n')
++CurPtr;		++CurPtr;

// If we have a CommentConsumer, notify it about the comment.		// If we have a CommentConsumer, notify it about the comment.
if (CommentConsumer) {		if (CommentConsumer) {
CommentConsumer->HandleComment(		CommentConsumer->HandleComment(
SMLoc::getFromPointer(CommentTextStart),		SMLoc::getFromPointer(CommentTextStart),
StringRef(CommentTextStart, CurPtr - 1 - CommentTextStart));		StringRef(CommentTextStart, NewlinePtr - 1 - CommentTextStart));
}		}

IsAtStartOfLine = true;		IsAtStartOfLine = true;
// This is a whole line comment. leave newline		// This is a whole line comment. leave newline
if (IsAtStartOfStatement)		if (IsAtStartOfStatement)
return AsmToken(AsmToken::EndOfStatement,		return AsmToken(AsmToken::EndOfStatement,
StringRef(TokStart, CurPtr - TokStart));		StringRef(TokStart, CurPtr - TokStart));
IsAtStartOfStatement = true;		IsAtStartOfStatement = true;
▲ Show 20 Lines • Show All 711 Lines • Show Last 20 Lines

llvm/test/tools/llvm-mca/directives-handle-crlf.s

This file was added.

				# RUN: llvm-mca -mtriple=x86_64-unknown-unknown %s
				# LLVM-MCA-BEGIN foo
				addl $42, %eax
				# LLVM-MCA-END