This is an archive of the discontinued LLVM Phabricator instance.

[MCParser] Correctly handle Windows line-endings when consuming lexed line comments
ClosedPublic

Authored by StephenTozer on Oct 27 2020, 8:09 AM.

Details

Summary

Fixes issue: https://bugs.llvm.org/show_bug.cgi?id=47983

The AsmLexer has a function LexLineComment that, as part of the lexing, passes the contents of the comment to a CommentConsumer if one exists. The passed comment is meant to exclude newline characters, but it does this by taking the range from the start of the comment inclusive to the last newline exclusive; this works with Unix line-endings, which are a single character, but fails when used with Windows line-endings, in which case the carriage return will be included as part of the passed comment. This causes an issue with llvm-mca, as it reads directives which have no label as directives with the label \r, but may result in inconsistent behaviour for any consumer when switching between line ending styles.

Diff Detail

Event Timeline

StephenTozer created this revision.Oct 27 2020, 8:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 27 2020, 8:09 AM
StephenTozer requested review of this revision.Oct 27 2020, 8:09 AM

I've added a test for the symptom that revealed this bug in llvm-mca. I'm also writing a unit test for AsmLexer that tests the underlying behaviour by verifying that CommentConsumers will not be sent characters that are not part of the line comment, since the problem is not specific to llvm-mca (although it's the only place that has seen an error so far, as far as I can tell).

Ping - there may be room for more testing further down the line, but upon reflection I think the current test is sufficient for this patch.

StephenTozer added a comment.EditedAug 17 2021, 5:49 AM

As the test needs to have its CRLF line endings preserved, it needs to be checked into git as a binary file - unfortunately, this means that the contents of the file are not uploaded to phabricator. Instead, I'll paste the contents of the test file here. For context, the file is created exactly from the reproducer in the buzilla report above.

Edit: Reduced test file contents.

	# RUN: llvm-mca %s
	# LLVM-MCA-BEGIN foo
	addl	$42, %eax
	# LLVM-MCA-END

Added text diff of test file to review.

andreadb accepted this revision.Aug 17 2021, 7:29 AM

Sounds reasonable to me.

This revision is now accepted and ready to land.Aug 17 2021, 7:29 AM
This revision was landed with ongoing or failed builds.Aug 17 2021, 7:53 AM
This revision was automatically updated to reflect the committed changes.

AsmCommentConsumer::HandleComment() is documented as excluding "the newline for single-line comments" which I think is reasonable to interpret as excluding the entire CRLF. So, the code change LGTM.

The only AsmCommentConsumer providers that I see in-tree are in llvm-mca and llvm-exegesis; the latter avoids the problem because it calls StringRef::trim() on the provided comment as the first thing it does in its HandleComment() override. It looks like llvm-mca doesn't do that, so the validity of the test is actually dependent on llvm-mca being less deft with its string handling than llvm-exegesis. That doesn't seem especially robust.

Really what you want here is a unittest that exercises the HandleComment API. The only existing asm lexer test I see is llvm/unittests/MC/SystemZ/SystemZAsmLexerTest.cpp which you might be able to leverage; really you want a target-independent test, though, and the SystemZ test infrastructure does a bunch of things you don't need.

I see @andreadb already approved, so please take this as a follow-up.

thakis added a subscriber: thakis.Aug 17 2021, 8:11 AM

Not quite sure why, but this breaks tests on my m1 bot: http://45.33.8.238/macm1/16248/step_11.txt

Probably the -mtriple=x86_64 means this needs a REQUIRES: line -- or the test should move into llvm/test/tools/llvm-mca/X86?

Please take a look :)

StephenTozer added a comment.EditedAug 17 2021, 8:22 AM

Not quite sure why, but this breaks tests on my m1 bot: http://45.33.8.238/macm1/16248/step_11.txt

Probably the -mtriple=x86_64 means this needs a REQUIRES: line -- or the test should move into llvm/test/tools/llvm-mca/X86?

Please take a look :)

Indeed, my mistake - I saw that error and have pushed a fix up for it (moving the test into X86).

Seeing some further issues on buildbots due to the absence of a -mcpu flag passed into the command, fixing again.

Seeing some further issues on buildbots due to the absence of a -mcpu flag passed into the command, fixing again.

Happy now. Thanks for the fix!