This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Lex/
-
Lex/
2/2
Lexer.cpp
-
unittests/Lex/
-
Lex/
1/2
LexerTest.cpp

Differential D118471

[clang][Lexer] Make raw and normal lexer behave the same for line comments
ClosedPublic

Authored by kadircet on Jan 28 2022, 7:24 AM.

Download Raw Diff

Details

Reviewers

sammccall

Commits

rGff77071a4d67: [clang][Lexer] Make raw and normal lexer behave the same for line comments

Summary

Normally there are heruistics in lexer to treat //* specially in
language modes that don't have line comments (to emit /). Unfortunately this
only applied to the first occurence of a line comment inside the file, as the
subsequent line comments were treated as if language had support for them.

This unfortunately only holds in normal lexing mode, as in raw mode all
occurences of line comments received this treatment, which created discrepancies
when comparing expanded and spelled tokens.

The proper fix would be to just make sure we treat all the line comments with a
subsequent * the same way, but it would imply breaking some code that's
accepted by clang today. So instead we introduce the same bug into raw lexing
mode.

Fixes https://github.com/clangd/clangd/issues/1003.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

kadircet created this revision.Jan 28 2022, 7:24 AM

Herald added a subscriber: usaxena95. · View Herald TranscriptJan 28 2022, 7:24 AM

kadircet requested review of this revision.Jan 28 2022, 7:24 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 28 2022, 7:24 AM

Herald added subscribers: cfe-commits, ilya-biryukov. · View Herald Transcript

Harbormaster completed remote builds in B146286: Diff 404019.Jan 28 2022, 8:49 AM

sammccall added inline comments.Jan 31 2022, 6:25 AM

clang/lib/Lex/Lexer.cpp
2387	It's much less obvious what this does in raw lexing mode. My rough understanding is raw lexing mode is ~stateless and we may choose to recreate the lexer, start lexing at any point etc. So having a stateful flag flip won't actually result in the same text being lexed the same way, because we may not run in the same sequence.

sammccall accepted this revision.Jan 31 2022, 6:47 AM

sammccall added inline comments.

clang/lib/Lex/Lexer.cpp
2387	OK, discussed offline: if lexing the whole file in a loop (motivating case), this will DTRT if lexing one token and thtrowing away the lexer, modifying LineComment is a no-op if lexing some range, we're going to be stateful but with possibly-wrong initial state As I understand, this fixes case 1, leaves 2 broken, and slightly changes the way in which 3 is broken, which seems OK. Like you say, the right fix is to take the statefulness out of the language extension, which can happen after the branch.

This revision is now accepted and ready to land.Jan 31 2022, 6:47 AM

This revision was landed with ongoing or failed builds.Jan 31 2022, 7:15 AM

Closed by commit rGff77071a4d67: [clang][Lexer] Make raw and normal lexer behave the same for line comments (authored by kadircet). · Explain Why

This revision was automatically updated to reflect the committed changes.

kadircet marked 2 inline comments as done.

kadircet added a commit: rGff77071a4d67: [clang][Lexer] Make raw and normal lexer behave the same for line comments.

probinson added a subscriber: probinson.Feb 4 2022, 3:03 PM

probinson added inline comments.

clang/unittests/Lex/LexerTest.cpp
654	@kadircet @sammccall It turns out this while loop is zero-trip; the test assertions in the body are never executed. Replace it with `assert(false);` and the test doesn't crash. That means `ToksView` is empty from the start. This is probably not what you wanted? In other words, the test does not exercise the patch. This is pretty serious. It needs to be fixed or reverted.

kadircet added inline comments.Feb 7 2022, 4:15 AM

clang/unittests/Lex/LexerTest.cpp
654	thanks! the discrepancy was actually having tokens in one but not the other, hence tests were failing initially but after the fix the tests passed (by making both lexing modes return none) so it skipped my attention. sending out a fix now.

Revision Contents

Path

Size

clang/

lib/

Lex/

Lexer.cpp

5 lines

unittests/

Lex/

LexerTest.cpp

25 lines

Diff 404514

clang/lib/Lex/Lexer.cpp

	Show First 20 Lines • Show All 2,372 Lines • ▼ Show 20 Lines
	/// return.			/// return.
	///			///
	/// If we're in KeepCommentMode or any CommentHandler has inserted			/// If we're in KeepCommentMode or any CommentHandler has inserted
	/// some tokens, this will store the first token and return true.			/// some tokens, this will store the first token and return true.
	bool Lexer::SkipLineComment(Token &Result, const char *CurPtr,			bool Lexer::SkipLineComment(Token &Result, const char *CurPtr,
	bool &TokAtPhysicalStartOfLine) {			bool &TokAtPhysicalStartOfLine) {
	// If Line comments aren't explicitly enabled for this language, emit an			// If Line comments aren't explicitly enabled for this language, emit an
	// extension warning.			// extension warning.
	if (!LangOpts.LineComment && !isLexingRawMode()) {			if (!LangOpts.LineComment) {
				if (!isLexingRawMode()) // There's no PP in raw mode, so can't emit diags.
	Diag(BufferPtr, diag::ext_line_comment);			Diag(BufferPtr, diag::ext_line_comment);

	// Mark them enabled so we only emit one warning for this translation			// Mark them enabled so we only emit one warning for this translation
	// unit.			// unit.
	LangOpts.LineComment = true;			LangOpts.LineComment = true;
				sammccallUnsubmitted Done Reply Inline Actions It's much less obvious what this does in raw lexing mode. My rough understanding is raw lexing mode is ~stateless and we may choose to recreate the lexer, start lexing at any point etc. So having a stateful flag flip won't actually result in the same text being lexed the same way, because we may not run in the same sequence. sammccall: It's much less obvious what this does in raw lexing mode. My rough understanding is raw lexing…
				sammccallUnsubmitted Done Reply Inline Actions OK, discussed offline: if lexing the whole file in a loop (motivating case), this will DTRT if lexing one token and thtrowing away the lexer, modifying LineComment is a no-op if lexing some range, we're going to be stateful but with possibly-wrong initial state As I understand, this fixes case 1, leaves 2 broken, and slightly changes the way in which 3 is broken, which seems OK. Like you say, the right fix is to take the statefulness out of the language extension, which can happen after the branch. sammccall: OK, discussed offline: - if lexing the whole file in a loop (motivating case), this will DTRT…
	}			}

	// Scan over the body of the comment. The common case, when scanning, is that			// Scan over the body of the comment. The common case, when scanning, is that
	// the comment contains normal ascii characters with nothing interesting in			// the comment contains normal ascii characters with nothing interesting in
	// them. As such, optimize for this case with the inner loop.			// them. As such, optimize for this case with the inner loop.
	//			//
	// This loop terminates with CurPtr pointing at the newline (or end of buffer)			// This loop terminates with CurPtr pointing at the newline (or end of buffer)
	// character that ends the line comment.			// character that ends the line comment.
	▲ Show 20 Lines • Show All 1,706 Lines • Show Last 20 Lines

clang/unittests/Lex/LexerTest.cpp

Show All 17 Lines
#include "clang/Basic/TokenKinds.h"		#include "clang/Basic/TokenKinds.h"
#include "clang/Lex/HeaderSearch.h"		#include "clang/Lex/HeaderSearch.h"
#include "clang/Lex/HeaderSearchOptions.h"		#include "clang/Lex/HeaderSearchOptions.h"
#include "clang/Lex/MacroArgs.h"		#include "clang/Lex/MacroArgs.h"
#include "clang/Lex/MacroInfo.h"		#include "clang/Lex/MacroInfo.h"
#include "clang/Lex/ModuleLoader.h"		#include "clang/Lex/ModuleLoader.h"
#include "clang/Lex/Preprocessor.h"		#include "clang/Lex/Preprocessor.h"
#include "clang/Lex/PreprocessorOptions.h"		#include "clang/Lex/PreprocessorOptions.h"
		#include "llvm/ADT/ArrayRef.h"
		#include "llvm/ADT/StringRef.h"
#include "gmock/gmock.h"		#include "gmock/gmock.h"
#include "gtest/gtest.h"		#include "gtest/gtest.h"
#include <memory>		#include <memory>
#include <vector>		#include <vector>

namespace {		namespace {
using namespace clang;		using namespace clang;
using testing::ElementsAre;		using testing::ElementsAre;
▲ Show 20 Lines • Show All 593 Lines • ▼ Show 20 Lines	while (1) {
Token tok;		Token tok;
PP->Lex(tok);		PP->Lex(tok);
if (tok.is(tok::eof))		if (tok.is(tok::eof))
break;		break;
}		}
EXPECT_EQ(SourceMgr.getNumCreatedFIDsForFileID(PP->getPredefinesFileID()),		EXPECT_EQ(SourceMgr.getNumCreatedFIDsForFileID(PP->getPredefinesFileID()),
1U);		1U);
}		}

		TEST_F(LexerTest, RawAndNormalLexSameForLineComments) {
		const llvm::StringLiteral Source = R"cpp(
		// First line comment.
		//* Second line comment which is ambigious.
		)cpp";
		LangOpts.LineComment = false;
		auto Toks = Lex(Source);
		auto &SM = PP->getSourceManager();
		auto SrcBuffer = SM.getBufferData(SM.getMainFileID());
		Lexer L(SM.getLocForStartOfFile(SM.getMainFileID()), PP->getLangOpts(),
		SrcBuffer.data(), SrcBuffer.data(),
		SrcBuffer.data() + SrcBuffer.size());

		auto ToksView = llvm::makeArrayRef(Toks);
		clang::Token T;
		while (!L.LexFromRawLexer(T)) {
		ASSERT_TRUE(!ToksView.empty());
		probinsonUnsubmitted Not Done Reply Inline Actions @kadircet @sammccall It turns out this while loop is zero-trip; the test assertions in the body are never executed. Replace it with `assert(false);` and the test doesn't crash. That means `ToksView` is empty from the start. This is probably not what you wanted? In other words, the test does not exercise the patch. This is pretty serious. It needs to be fixed or reverted. probinson: @kadircet @sammccall It turns out this while loop is zero-trip; the test assertions in the…
		kadircetAuthorUnsubmitted Done Reply Inline Actions thanks! the discrepancy was actually having tokens in one but not the other, hence tests were failing initially but after the fix the tests passed (by making both lexing modes return none) so it skipped my attention. sending out a fix now. kadircet: thanks! the discrepancy was actually having tokens in one but not the other, hence tests were…
		EXPECT_EQ(T.getKind(), ToksView.front().getKind());
		ToksView = ToksView.drop_front();
		}
		EXPECT_TRUE(ToksView.empty());
		}
} // anonymous namespace		} // anonymous namespace