This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
cfe/trunk/
-
trunk/
-
include/clang/AST/
-
clang/
-
AST/
-
CommentLexer.h
-
RawCommentList.h
-
lib/AST/
-
AST/
-
CommentLexer.cpp
-
RawCommentList.cpp
-
unittests/AST/
-
AST/
-
CMakeLists.txt
-
CommentTextTest.cpp

Differential D46000

[AST] Added a helper to extract a user-friendly text of a comment.
ClosedPublic

Authored by ilya-biryukov on Apr 24 2018, 4:52 AM.

Download Raw Diff

Details

Reviewers

sammccall
hokein
ioeric

Commits

rG1ff7c32fc91c: [AST] Added a helper to extract a user-friendly text of a comment.
rC332458: [AST] Added a helper to extract a user-friendly text of a comment.
rL332458: [AST] Added a helper to extract a user-friendly text of a comment.

Summary

The helper is used in clangd for documentation shown in code completion
and storing the docs in the symbols. See D45999.

This patch reuses the code of the Doxygen comment lexer, disabling the
bits that do command and html tag parsing.
The new helper works on all comments, including non-doxygen comments
and is faster. However, it does not understand or transform any
doxygen directives, i.e. cannot extract brief text, etc.

Diff Detail

Repository: rL LLVM

Event Timeline

ilya-biryukov created this revision.Apr 24 2018, 4:52 AM

Harbormaster completed remote builds in B17357: Diff 143711.Apr 24 2018, 4:52 AM

ilya-biryukov added a child revision: D45999: [clangd] Retrieve minimally formatted comment text in completion..Apr 24 2018, 4:54 AM

Added forgotten bits of the change

ilya-biryukov mentioned this in D46001: [CodeComplete] Expose helpers to get RawComment of completion result..Apr 25 2018, 3:01 AM

Overall looks good. Could you add tests for the new methods?

lib/AST/CommentLexer.cpp
294 ↗	(On Diff #143881)	micro-nit: I'd probably return ParseCommands ? lexWithCommands(T) : lexWithoutCommands(T);
471 ↗	(On Diff #143881)	Can we share code with `lexCommentTextWithCommands` for these two common cases?
lib/AST/RawCommentList.cpp
353 ↗	(On Diff #143881)	nit: `SkipWhitespaces` for readability?
380 ↗	(On Diff #143881)	Explain when this would be invalid and why `TokColumn = 0` is used?
383 ↗	(On Diff #143881)	nit: `unsigned MaxSkip = IsFirstLine ? ... : ...;`
392 ↗	(On Diff #143881)	I'd probably make `SkipWs` return the number of white spaces skipped and do the drop-front here, so that you could simplify the awkward calculation of `IndentColumn` below.

In D46000#1077926, @ioeric wrote:

Overall looks good. Could you add tests for the new methods?

Sure. There are a few tests in D46002, but I haven't (yet) moved them to clang.

Attempt to reuse lexing code with/without command parsing.
Get rid of SkipWs.

lib/AST/CommentLexer.cpp
471 ↗	(On Diff #143881)	I couldn't come up with a way to do that previsouly. Made another attempt which seems to work. Please take a look, the change is somewhat non-trivial (includes removing the loop that seems redundant)
lib/AST/RawCommentList.cpp
380 ↗	(On Diff #143881)	I don't know whether this can be even be invalid, but I'm not confident enough to add an assert there. `TokColumn = 0` seems like a reasonable way to recover if we can't compute the column number, i.e. assume the line starts at the first column if SourceLocation of the line was invalid for any reason. This whole column thing looks weird to me, maybe I should just remove it altogether and just remove the same amount of whitespace in all the lines. WDYT?
383 ↗	(On Diff #143881)	That would force to get rid of the comments in the if branches, but they seem to be useful. Am I missing an obvious style that would preserve the comments?
392 ↗	(On Diff #143881)	Got rid of it altogether. The code seems is clearer now, thanks for the suggestion!

Update a comment after latest changes

Harbormaster completed remote builds in B17411: Diff 143929.Apr 25 2018, 7:26 AM

Fix indentation

Harbormaster completed remote builds in B17412: Diff 143930.Apr 25 2018, 7:28 AM

ilya-biryukov added inline comments.Apr 25 2018, 8:28 AM

lib/AST/RawCommentList.cpp
380 ↗	(On Diff #143881)	On a second thought, now I remember why I added this in the first place. To support the following example we want to take column numbers into account: class Foo { /* A block comment spanning multiple lines has too many spaces on the all lines except the first one. */ int func(); };

ioeric added inline comments.Apr 25 2018, 11:39 AM

include/clang/AST/RawCommentList.h
118 ↗	(On Diff #143930)	I'm trying to understand how these cases and RawComment work. For this case, are the `// ...` block and `/* ... */` merged in one `RawComment` by default?
126 ↗	(On Diff #143930)	Are the `*`s in each lines automatically consumed by the lexer?
lib/AST/CommentLexer.cpp
301 ↗	(On Diff #143930)	I think we could avoid the "somewhat non-trivial" control flow by merging command and command-less cases in one function: Something like: const char *TokenPtr = ...; auto HandleNonCommandToken = ...; if (!ParseComand) { HandleNonCommandToken(...); return; } ... switch (...) { // after all command cases default: HandleNonCommandToken(...); }
331 ↗	(On Diff #143930)	We should be extra careful about removing the loop... (It does seem to be redundant though)
lib/AST/RawCommentList.cpp
343 ↗	(On Diff #143930)	nit: We don't really know if there was a failure or the comment is simply empty. So I'd probably leave out the comment here to avoid confusion.
349 ↗	(On Diff #143930)	`s/ParseCommentText/ParseCommands/`?
352 ↗	(On Diff #143930)	This variable could use a comment.
393 ↗	(On Diff #143930)	I think `MaxSkip` could be removed with: llvm::StringRef Trimmed = TokText.drop_front(IsFirstLine ? WhitespaceLen : std::max((int)IndentColumn - (int)TokColumn, 0)));
380 ↗	(On Diff #143881)	So I think you would want to force `MaxSkip` to 0 if token loc is invalid to make sure no comment is accidentally eaten?
383 ↗	(On Diff #143881)	You could simply merge the comments, which doesn't seem to compromise readability.

ilya-biryukov marked 4 inline comments as done.Apr 26 2018, 2:32 AM

ilya-biryukov added inline comments.

include/clang/AST/RawCommentList.h
118 ↗	(On Diff #143930)	Yes, `RawComment` can represent multiple merged comments of different styles.
126 ↗	(On Diff #143930)	Yes, see `comments::Lexer::skipLineStartingDecorations()`
lib/AST/RawCommentList.cpp
343 ↗	(On Diff #143930)	`getRawTextSlow` returns empty string on error, so this comment has some ground. Removed it anyway
349 ↗	(On Diff #143930)	Thanks for catching this!
393 ↗	(On Diff #143930)	I've removed `MaxSkip`, but introduced `SkipLen` instead that captures an actual numbers of chars we want to skip. I would still keep as a separate variable, as the expression seems somewhat complex and giving a name to it, arguably, makes the code more readable.
380 ↗	(On Diff #143881)	Makes sense, thanks. Unfortunately I don't have any ideas on how we can test this case :-(

Remove tryLexCommands(), call into helper that parses commands directly
Addressed other review comments

Harbormaster completed remote builds in B17440: Diff 144083.Apr 26 2018, 2:32 AM

Looks good. We still need tests though :)

lib/AST/RawCommentList.cpp
376 ↗	(On Diff #144083)	This is a bit confusing... Could you please add comments about the behavior here (as chatted offline)?
394 ↗	(On Diff #144083)	use `static_cast` instead of conversions.
405 ↗	(On Diff #144083)	I think it's end of file?

Move unit tests from clangd code to AST tests
Assert locations are valid
Address review other comments

Herald added a subscriber: mgorny. · View Herald TranscriptMay 8 2018, 6:38 AM

ilya-biryukov added inline comments.May 8 2018, 6:43 AM

lib/AST/RawCommentList.cpp
376 ↗	(On Diff #144083)	After thinking about it for a while, decided to add an assert that location was valid instead. Invalid locations don't make any sense there, since we won't be able to get the comment text in case of invalid locs.
394 ↗	(On Diff #144083)	Done. Rewrote the code to keep the reduce the number of casts too. It was unreadable with 3 static casts and default formatting.

Fixed infinite loop with comments that contain doxygen commands

Thanks for adding the tests!

include/clang/AST/RawCommentList.h
138 ↗	(On Diff #145691)	I think we can get rid of the interface that takes `ASTContext`? If `SourceManager` and `Diags` are sufficient, I don't see why we would want another interface for ASTContext.
lib/AST/RawCommentList.cpp
352 ↗	(On Diff #145691)	I'm not quite sure about this. Could we just require a `CommandTraits` in the interface? And only make this assumption in tests?
unittests/AST/CommentTextTest.cpp
32 ↗	(On Diff #145691)	`SourceManagerForFile` added in D46176 should save you a few lines here. (I'm landing it right now...)

Simplify test code with SourceManagerForFile.

Harbormaster completed remote builds in B17990: Diff 146326.May 11 2018, 7:35 AM

ilya-biryukov added inline comments.May 11 2018, 7:35 AM

include/clang/AST/RawCommentList.h
138 ↗	(On Diff #145691)	Two reasons that come to mind: it's simpler to use and follows the API of `getBriefText`. If not for mocking the tests, I would totally only keep the `ASTContext`-based one, since it does not really make any sense to create `RawComment` without `ASTContext` for any reason other than testing.
lib/AST/RawCommentList.cpp
352 ↗	(On Diff #145691)	I think we shouldn't add this to params, the whole point of this function is to do parsing that ignores the commands and the `CommandTraits`. The fact that lexer still needs them is because we haven't extracted a simpler interface from `Lexer` that doesn't rely on unused params, i.e. `CommandTraits` and `Allocator`.
unittests/AST/CommentTextTest.cpp
32 ↗	(On Diff #145691)	Thanks!

Removed the overload that accepts ASTContext

Harbormaster completed remote builds in B18089: Diff 146757.May 15 2018, 1:40 AM

ilya-biryukov marked 2 inline comments as done.May 15 2018, 1:40 AM

lgtm

include/clang/AST/RawCommentList.h
138 ↗	(On Diff #145691)	As discussed offline, two interfaces that do the same thing are a bit confusing. I would still suggest favoring minimal API and test-ability over simplified usage - two parameters aren't that better than one after all :)
lib/AST/RawCommentList.cpp
352 ↗	(On Diff #145691)	Makes sense.

This revision is now accepted and ready to land.May 15 2018, 1:41 AM

Closed by commit rL332458: [AST] Added a helper to extract a user-friendly text of a comment. (authored by ibiryukov). · Explain WhyMay 16 2018, 5:34 AM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptMay 16 2018, 5:34 AM

Revision Contents

Path

Size

cfe/

trunk/

include/

clang/

AST/

CommentLexer.h

21 lines

RawCommentList.h

24 lines

lib/

AST/

CommentLexer.cpp

242 lines

RawCommentList.cpp

91 lines

unittests/

AST/

CMakeLists.txt

1 line

CommentTextTest.cpp

122 lines

Diff 147062

cfe/trunk/include/clang/AST/CommentLexer.h

Show First 20 Lines • Show All 275 Lines • ▼ Show 20 Lines	private:

/// Current lexing mode.		/// Current lexing mode.
LexerState State;		LexerState State;

/// If State is LS_VerbatimBlock, contains the name of verbatim end		/// If State is LS_VerbatimBlock, contains the name of verbatim end
/// command, including command marker.		/// command, including command marker.
SmallString<16> VerbatimBlockEndCommandName;		SmallString<16> VerbatimBlockEndCommandName;

		/// If true, the commands, html tags, etc will be parsed and reported as
		/// separate tokens inside the comment body. If false, the comment text will
		/// be parsed into text and newline tokens.
		bool ParseCommands;

/// Given a character reference name (e.g., "lt"), return the character that		/// Given a character reference name (e.g., "lt"), return the character that
/// it stands for (e.g., "<").		/// it stands for (e.g., "<").
StringRef resolveHTMLNamedCharacterReference(StringRef Name) const;		StringRef resolveHTMLNamedCharacterReference(StringRef Name) const;

/// Given a Unicode codepoint as base-10 integer, return the character.		/// Given a Unicode codepoint as base-10 integer, return the character.
StringRef resolveHTMLDecimalCharacterReference(StringRef Name) const;		StringRef resolveHTMLDecimalCharacterReference(StringRef Name) const;

/// Given a Unicode codepoint as base-16 integer, return the character.		/// Given a Unicode codepoint as base-16 integer, return the character.
Show All 18 Lines	private:

DiagnosticBuilder Diag(SourceLocation Loc, unsigned DiagID) {		DiagnosticBuilder Diag(SourceLocation Loc, unsigned DiagID) {
return Diags.Report(Loc, DiagID);		return Diags.Report(Loc, DiagID);
}		}

/// Eat string matching regexp \code \s\ \endcode.		/// Eat string matching regexp \code \s\ \endcode.
void skipLineStartingDecorations();		void skipLineStartingDecorations();

/// Lex stuff inside comments. CommentEnd should be set correctly.		/// Lex comment text, including commands if ParseCommands is set to true.
void lexCommentText(Token &T);		void lexCommentText(Token &T);

void setupAndLexVerbatimBlock(Token &T,		void setupAndLexVerbatimBlock(Token &T, const char *TextBegin, char Marker,
const char *TextBegin,		const CommandInfo *Info);
char Marker, const CommandInfo *Info);

void lexVerbatimBlockFirstLine(Token &T);		void lexVerbatimBlockFirstLine(Token &T);

void lexVerbatimBlockBody(Token &T);		void lexVerbatimBlockBody(Token &T);

void setupAndLexVerbatimLine(Token &T, const char *TextBegin,		void setupAndLexVerbatimLine(Token &T, const char *TextBegin,
const CommandInfo *Info);		const CommandInfo *Info);

void lexVerbatimLineText(Token &T);		void lexVerbatimLineText(Token &T);

void lexHTMLCharacterReference(Token &T);		void lexHTMLCharacterReference(Token &T);

void setupAndLexHTMLStartTag(Token &T);		void setupAndLexHTMLStartTag(Token &T);

void lexHTMLStartTag(Token &T);		void lexHTMLStartTag(Token &T);

void setupAndLexHTMLEndTag(Token &T);		void setupAndLexHTMLEndTag(Token &T);

void lexHTMLEndTag(Token &T);		void lexHTMLEndTag(Token &T);

public:		public:
Lexer(llvm::BumpPtrAllocator &Allocator, DiagnosticsEngine &Diags,		Lexer(llvm::BumpPtrAllocator &Allocator, DiagnosticsEngine &Diags,
const CommandTraits &Traits,		const CommandTraits &Traits, SourceLocation FileLoc,
SourceLocation FileLoc,		const char BufferStart, const char BufferEnd,
const char BufferStart, const char BufferEnd);		bool ParseCommands = true);

void lex(Token &T);		void lex(Token &T);

StringRef getSpelling(const Token &Tok,		StringRef getSpelling(const Token &Tok, const SourceManager &SourceMgr,
const SourceManager &SourceMgr,
bool *Invalid = nullptr) const;		bool *Invalid = nullptr) const;
};		};

} // end namespace comments		} // end namespace comments
} // end namespace clang		} // end namespace clang

#endif		#endif

cfe/trunk/include/clang/AST/RawCommentList.h

Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	public:

const char *getBriefText(const ASTContext &Context) const {		const char *getBriefText(const ASTContext &Context) const {
if (BriefTextValid)		if (BriefTextValid)
return BriefText;		return BriefText;

return extractBriefText(Context);		return extractBriefText(Context);
}		}

		/// Returns sanitized comment text, suitable for presentation in editor UIs.
		/// E.g. will transform:
		/// // This is a long multiline comment.
		/// // Parts of it might be indented.
		/// /* The comments styles might be mixed. */
		/// into
		/// "This is a long multiline comment.\n"
		/// " Parts of it might be indented.\n"
		/// "The comments styles might be mixed."
		/// Also removes leading indentation and sanitizes some common cases:
		/// /* This is a first line.
		/// * This is a second line. It is indented.
		/// * This is a third line. */
		/// and
		/// /* This is a first line.
		/// This is a second line. It is indented.
		/// This is a third line. */
		/// will both turn into:
		/// "This is a first line.\n"
		/// " This is a second line. It is indented.\n"
		/// "This is a third line."
		std::string getFormattedText(const SourceManager &SourceMgr,
		DiagnosticsEngine &Diags) const;

/// Parse the comment, assuming it is attached to decl \c D.		/// Parse the comment, assuming it is attached to decl \c D.
comments::FullComment *parse(const ASTContext &Context,		comments::FullComment *parse(const ASTContext &Context,
const Preprocessor PP, const Decl D) const;		const Preprocessor PP, const Decl D) const;

private:		private:
SourceRange Range;		SourceRange Range;

mutable StringRef RawText;		mutable StringRef RawText;
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

cfe/trunk/lib/AST/CommentLexer.cpp

	Show First 20 Lines • Show All 288 Lines • ▼ Show 20 Lines
	#endif			#endif
	BufferPtr = TokEnd;			BufferPtr = TokEnd;
	}			}

	void Lexer::lexCommentText(Token &T) {			void Lexer::lexCommentText(Token &T) {
	assert(CommentState == LCS_InsideBCPLComment \|\|			assert(CommentState == LCS_InsideBCPLComment \|\|
	CommentState == LCS_InsideCComment);			CommentState == LCS_InsideCComment);

				// Handles lexing non-command text, i.e. text and newline.
				auto HandleNonCommandToken = [&]() -> void {
				assert(State == LS_Normal);

				const char *TokenPtr = BufferPtr;
				assert(TokenPtr < CommentEnd);
				switch (*TokenPtr) {
				case '\n':
				case '\r':
				TokenPtr = skipNewline(TokenPtr, CommentEnd);
				formTokenWithChars(T, TokenPtr, tok::newline);

				if (CommentState == LCS_InsideCComment)
				skipLineStartingDecorations();
				return;

				default: {
				StringRef TokStartSymbols = ParseCommands ? "\n\r\\@&<" : "\n\r";
				size_t End = StringRef(TokenPtr, CommentEnd - TokenPtr)
				.find_first_of(TokStartSymbols);
				if (End != StringRef::npos)
				TokenPtr += End;
				else
				TokenPtr = CommentEnd;
				formTextToken(T, TokenPtr);
				return;
				}
				}
				};

				if (!ParseCommands)
				return HandleNonCommandToken();

	switch (State) {			switch (State) {
	case LS_Normal:			case LS_Normal:
	break;			break;
	case LS_VerbatimBlockFirstLine:			case LS_VerbatimBlockFirstLine:
	lexVerbatimBlockFirstLine(T);			lexVerbatimBlockFirstLine(T);
	return;			return;
	case LS_VerbatimBlockBody:			case LS_VerbatimBlockBody:
	lexVerbatimBlockBody(T);			lexVerbatimBlockBody(T);
	return;			return;
	case LS_VerbatimLineText:			case LS_VerbatimLineText:
	lexVerbatimLineText(T);			lexVerbatimLineText(T);
	return;			return;
	case LS_HTMLStartTag:			case LS_HTMLStartTag:
	lexHTMLStartTag(T);			lexHTMLStartTag(T);
	return;			return;
	case LS_HTMLEndTag:			case LS_HTMLEndTag:
	lexHTMLEndTag(T);			lexHTMLEndTag(T);
	return;			return;
	}			}

	assert(State == LS_Normal);			assert(State == LS_Normal);

	const char *TokenPtr = BufferPtr;			const char *TokenPtr = BufferPtr;
	assert(TokenPtr < CommentEnd);			assert(TokenPtr < CommentEnd);
	while (TokenPtr != CommentEnd) {
	switch(*TokenPtr) {			switch(*TokenPtr) {
	case '\\':			case '\\':
	case '@': {			case '@': {
	// Commands that start with a backslash and commands that start with			// Commands that start with a backslash and commands that start with
	// 'at' have equivalent semantics. But we keep information about the			// 'at' have equivalent semantics. But we keep information about the
	// exact syntax in AST for comments.			// exact syntax in AST for comments.
	tok::TokenKind CommandKind =			tok::TokenKind CommandKind =
	(*TokenPtr == '@') ? tok::at_command : tok::backslash_command;			(*TokenPtr == '@') ? tok::at_command : tok::backslash_command;
	TokenPtr++;			TokenPtr++;
	if (TokenPtr == CommentEnd) {			if (TokenPtr == CommentEnd) {
	formTextToken(T, TokenPtr);			formTextToken(T, TokenPtr);
	return;			return;
	}			}
	char C = *TokenPtr;			char C = *TokenPtr;
	switch (C) {			switch (C) {
	default:			default:
	break;			break;

	case '\\': case '@': case '&': case '$':			case '\\': case '@': case '&': case '$':
	case '#': case '<': case '>': case '%':			case '#': case '<': case '>': case '%':
	case '\"': case '.': case ':':			case '\"': case '.': case ':':
	// This is one of \\ \@ \& \$ etc escape sequences.			// This is one of \\ \@ \& \$ etc escape sequences.
	TokenPtr++;			TokenPtr++;
	if (C == ':' && TokenPtr != CommentEnd && *TokenPtr == ':') {			if (C == ':' && TokenPtr != CommentEnd && *TokenPtr == ':') {
	// This is the \:: escape sequence.			// This is the \:: escape sequence.
	TokenPtr++;			TokenPtr++;
	}			}
	StringRef UnescapedText(BufferPtr + 1, TokenPtr - (BufferPtr + 1));			StringRef UnescapedText(BufferPtr + 1, TokenPtr - (BufferPtr + 1));
	formTokenWithChars(T, TokenPtr, tok::text);			formTokenWithChars(T, TokenPtr, tok::text);
	T.setText(UnescapedText);			T.setText(UnescapedText);
	return;			return;
	}			}

	// Don't make zero-length commands.			// Don't make zero-length commands.
	if (!isCommandNameStartCharacter(*TokenPtr)) {			if (!isCommandNameStartCharacter(*TokenPtr)) {
	formTextToken(T, TokenPtr);			formTextToken(T, TokenPtr);
	return;			return;
	}			}

	TokenPtr = skipCommandName(TokenPtr, CommentEnd);			TokenPtr = skipCommandName(TokenPtr, CommentEnd);
	unsigned Length = TokenPtr - (BufferPtr + 1);			unsigned Length = TokenPtr - (BufferPtr + 1);

	// Hardcoded support for lexing LaTeX formula commands			// Hardcoded support for lexing LaTeX formula commands
	// \f$ \f[ \f] \f{ \f} as a single command.			// \f$ \f[ \f] \f{ \f} as a single command.
	if (Length == 1 && TokenPtr[-1] == 'f' && TokenPtr != CommentEnd) {			if (Length == 1 && TokenPtr[-1] == 'f' && TokenPtr != CommentEnd) {
	C = *TokenPtr;			C = *TokenPtr;
	if (C == '$' \|\| C == '[' \|\| C == ']' \|\| C == '{' \|\| C == '}') {			if (C == '$' \|\| C == '[' \|\| C == ']' \|\| C == '{' \|\| C == '}') {
	TokenPtr++;			TokenPtr++;
	Length++;			Length++;
	}			}
	}			}

	StringRef CommandName(BufferPtr + 1, Length);			StringRef CommandName(BufferPtr + 1, Length);

	const CommandInfo *Info = Traits.getCommandInfoOrNULL(CommandName);			const CommandInfo *Info = Traits.getCommandInfoOrNULL(CommandName);
	if (!Info) {			if (!Info) {
	if ((Info = Traits.getTypoCorrectCommandInfo(CommandName))) {			if ((Info = Traits.getTypoCorrectCommandInfo(CommandName))) {
	StringRef CorrectedName = Info->Name;			StringRef CorrectedName = Info->Name;
	SourceLocation Loc = getSourceLocation(BufferPtr);			SourceLocation Loc = getSourceLocation(BufferPtr);
	SourceLocation EndLoc = getSourceLocation(TokenPtr);			SourceLocation EndLoc = getSourceLocation(TokenPtr);
	SourceRange FullRange = SourceRange(Loc, EndLoc);			SourceRange FullRange = SourceRange(Loc, EndLoc);
	SourceRange CommandRange(Loc.getLocWithOffset(1), EndLoc);			SourceRange CommandRange(Loc.getLocWithOffset(1), EndLoc);
	Diag(Loc, diag::warn_correct_comment_command_name)			Diag(Loc, diag::warn_correct_comment_command_name)
	<< FullRange << CommandName << CorrectedName			<< FullRange << CommandName << CorrectedName
	<< FixItHint::CreateReplacement(CommandRange, CorrectedName);			<< FixItHint::CreateReplacement(CommandRange, CorrectedName);
	} else {			} else {
	formTokenWithChars(T, TokenPtr, tok::unknown_command);			formTokenWithChars(T, TokenPtr, tok::unknown_command);
	T.setUnknownCommandName(CommandName);			T.setUnknownCommandName(CommandName);
	Diag(T.getLocation(), diag::warn_unknown_comment_command_name)			Diag(T.getLocation(), diag::warn_unknown_comment_command_name)
	<< SourceRange(T.getLocation(), T.getEndLocation());			<< SourceRange(T.getLocation(), T.getEndLocation());
	return;			return;
	}			}
	}			}
	if (Info->IsVerbatimBlockCommand) {			if (Info->IsVerbatimBlockCommand) {
	setupAndLexVerbatimBlock(T, TokenPtr, *BufferPtr, Info);			setupAndLexVerbatimBlock(T, TokenPtr, *BufferPtr, Info);
	return;			return;
	}			}
	if (Info->IsVerbatimLineCommand) {			if (Info->IsVerbatimLineCommand) {
	setupAndLexVerbatimLine(T, TokenPtr, Info);			setupAndLexVerbatimLine(T, TokenPtr, Info);
	return;			return;
	}			}
	formTokenWithChars(T, TokenPtr, CommandKind);			formTokenWithChars(T, TokenPtr, CommandKind);
	T.setCommandID(Info->getID());			T.setCommandID(Info->getID());
	return;			return;
	}			}

	case '&':			case '&':
	lexHTMLCharacterReference(T);			lexHTMLCharacterReference(T);
	return;			return;

	case '<': {			case '<': {
	TokenPtr++;			TokenPtr++;
	if (TokenPtr == CommentEnd) {			if (TokenPtr == CommentEnd) {
	formTextToken(T, TokenPtr);			formTextToken(T, TokenPtr);
	return;			return;
	}			}
	const char C = *TokenPtr;			const char C = *TokenPtr;
	if (isHTMLIdentifierStartingCharacter(C))			if (isHTMLIdentifierStartingCharacter(C))
	setupAndLexHTMLStartTag(T);			setupAndLexHTMLStartTag(T);
	else if (C == '/')			else if (C == '/')
	setupAndLexHTMLEndTag(T);			setupAndLexHTMLEndTag(T);
	else			else
	formTextToken(T, TokenPtr);			formTextToken(T, TokenPtr);
	return;			return;
	}			}

	case '\n':			default:
	case '\r':			return HandleNonCommandToken();
	TokenPtr = skipNewline(TokenPtr, CommentEnd);
	formTokenWithChars(T, TokenPtr, tok::newline);

	if (CommentState == LCS_InsideCComment)
	skipLineStartingDecorations();
	return;

	default: {
	size_t End = StringRef(TokenPtr, CommentEnd - TokenPtr).
	find_first_of("\n\r\\@&<");
	if (End != StringRef::npos)
	TokenPtr += End;
	else
	TokenPtr = CommentEnd;
	formTextToken(T, TokenPtr);
	return;
	}
	}
	}			}
	}			}

	void Lexer::setupAndLexVerbatimBlock(Token &T,			void Lexer::setupAndLexVerbatimBlock(Token &T,
	const char *TextBegin,			const char *TextBegin,
	char Marker, const CommandInfo *Info) {			char Marker, const CommandInfo *Info) {
	assert(Info->IsVerbatimBlockCommand);			assert(Info->IsVerbatimBlockCommand);

	▲ Show 20 Lines • Show All 266 Lines • ▼ Show 20 Lines
	void Lexer::lexHTMLEndTag(Token &T) {			void Lexer::lexHTMLEndTag(Token &T) {
	assert(BufferPtr != CommentEnd && *BufferPtr == '>');			assert(BufferPtr != CommentEnd && *BufferPtr == '>');

	formTokenWithChars(T, BufferPtr + 1, tok::html_greater);			formTokenWithChars(T, BufferPtr + 1, tok::html_greater);
	State = LS_Normal;			State = LS_Normal;
	}			}

	Lexer::Lexer(llvm::BumpPtrAllocator &Allocator, DiagnosticsEngine &Diags,			Lexer::Lexer(llvm::BumpPtrAllocator &Allocator, DiagnosticsEngine &Diags,
	const CommandTraits &Traits,			const CommandTraits &Traits, SourceLocation FileLoc,
	SourceLocation FileLoc,			const char BufferStart, const char BufferEnd,
	const char BufferStart, const char BufferEnd):			bool ParseCommands)
	Allocator(Allocator), Diags(Diags), Traits(Traits),			: Allocator(Allocator), Diags(Diags), Traits(Traits),
	BufferStart(BufferStart), BufferEnd(BufferEnd),			BufferStart(BufferStart), BufferEnd(BufferEnd), FileLoc(FileLoc),
	FileLoc(FileLoc), BufferPtr(BufferStart),			BufferPtr(BufferStart), CommentState(LCS_BeforeComment), State(LS_Normal),
	CommentState(LCS_BeforeComment), State(LS_Normal) {			ParseCommands(ParseCommands) {}
	}

	void Lexer::lex(Token &T) {			void Lexer::lex(Token &T) {
	again:			again:
	switch (CommentState) {			switch (CommentState) {
	case LCS_BeforeComment:			case LCS_BeforeComment:
	if (BufferPtr == BufferEnd) {			if (BufferPtr == BufferEnd) {
	formTokenWithChars(T, BufferPtr, tok::eof);			formTokenWithChars(T, BufferPtr, tok::eof);
	return;			return;
	▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines

cfe/trunk/lib/AST/RawCommentList.cpp

Show First 20 Lines • Show All 329 Lines • ▼ Show 20 Lines	void RawCommentList::addDeserializedComments(ArrayRef<RawComment *> DeserializedComments) {
MergedComments.reserve(Comments.size() + DeserializedComments.size());		MergedComments.reserve(Comments.size() + DeserializedComments.size());

std::merge(Comments.begin(), Comments.end(),		std::merge(Comments.begin(), Comments.end(),
DeserializedComments.begin(), DeserializedComments.end(),		DeserializedComments.begin(), DeserializedComments.end(),
std::back_inserter(MergedComments),		std::back_inserter(MergedComments),
BeforeThanCompare<RawComment>(SourceMgr));		BeforeThanCompare<RawComment>(SourceMgr));
std::swap(Comments, MergedComments);		std::swap(Comments, MergedComments);
}		}

		std::string RawComment::getFormattedText(const SourceManager &SourceMgr,
		DiagnosticsEngine &Diags) const {
		llvm::StringRef CommentText = getRawText(SourceMgr);
		if (CommentText.empty())
		return "";

		llvm::BumpPtrAllocator Allocator;
		// We do not parse any commands, so CommentOptions are ignored by
		// comments::Lexer. Therefore, we just use default-constructed options.
		CommentOptions DefOpts;
		comments::CommandTraits EmptyTraits(Allocator, DefOpts);
		comments::Lexer L(Allocator, Diags, EmptyTraits, getSourceRange().getBegin(),
		CommentText.begin(), CommentText.end(),
		/ParseCommands=/false);

		std::string Result;
		// A column number of the first non-whitespace token in the comment text.
		// We skip whitespace up to this column, but keep the whitespace after this
		// column. IndentColumn is calculated when lexing the first line and reused
		// for the rest of lines.
		unsigned IndentColumn = 0;

		// Processes one line of the comment and adds it to the result.
		// Handles skipping the indent at the start of the line.
		// Returns false when eof is reached and true otherwise.
		auto LexLine = [&](bool IsFirstLine) -> bool {
		comments::Token Tok;
		// Lex the first token on the line. We handle it separately, because we to
		// fix up its indentation.
		L.lex(Tok);
		if (Tok.is(comments::tok::eof))
		return false;
		if (Tok.is(comments::tok::newline)) {
		Result += "\n";
		return true;
		}
		llvm::StringRef TokText = L.getSpelling(Tok, SourceMgr);
		bool LocInvalid = false;
		unsigned TokColumn =
		SourceMgr.getSpellingColumnNumber(Tok.getLocation(), &LocInvalid);
		assert(!LocInvalid && "getFormattedText for invalid location");

		// Amount of leading whitespace in TokText.
		size_t WhitespaceLen = TokText.find_first_not_of(" \t");
		if (WhitespaceLen == StringRef::npos)
		WhitespaceLen = TokText.size();
		// Remember the amount of whitespace we skipped in the first line to remove
		// indent up to that column in the following lines.
		if (IsFirstLine)
		IndentColumn = TokColumn + WhitespaceLen;

		// Amount of leading whitespace we actually want to skip.
		// For the first line we skip all the whitespace.
		// For the rest of the lines, we skip whitespace up to IndentColumn.
		unsigned SkipLen =
		IsFirstLine
		? WhitespaceLen
		: std::min<size_t>(
		WhitespaceLen,
		std::max<int>(static_cast<int>(IndentColumn) - TokColumn, 0));
		llvm::StringRef Trimmed = TokText.drop_front(SkipLen);
		Result += Trimmed;
		// Lex all tokens in the rest of the line.
		for (L.lex(Tok); Tok.isNot(comments::tok::eof); L.lex(Tok)) {
		if (Tok.is(comments::tok::newline)) {
		Result += "\n";
		return true;
		}
		Result += L.getSpelling(Tok, SourceMgr);
		}
		// We've reached the end of file token.
		return false;
		};

		auto DropTrailingNewLines = [](std::string &Str) {
		while (Str.back() == '\n')
		Str.pop_back();
		};

		// Proces first line separately to remember indent for the following lines.
		if (!LexLine(/IsFirstLine=/true)) {
		DropTrailingNewLines(Result);
		return Result;
		}
		// Process the rest of the lines.
		while (LexLine(/IsFirstLine=/false))
		;
		DropTrailingNewLines(Result);
		return Result;
		}

cfe/trunk/unittests/AST/CMakeLists.txt

	set(LLVM_LINK_COMPONENTS			set(LLVM_LINK_COMPONENTS
	Support			Support
	)			)

	add_clang_unittest(ASTTests			add_clang_unittest(ASTTests
	ASTContextParentMapTest.cpp			ASTContextParentMapTest.cpp
	ASTImporterTest.cpp			ASTImporterTest.cpp
	ASTTypeTraitsTest.cpp			ASTTypeTraitsTest.cpp
	ASTVectorTest.cpp			ASTVectorTest.cpp
	CommentLexer.cpp			CommentLexer.cpp
	CommentParser.cpp			CommentParser.cpp
				CommentTextTest.cpp
	DataCollectionTest.cpp			DataCollectionTest.cpp
	DeclPrinterTest.cpp			DeclPrinterTest.cpp
	DeclTest.cpp			DeclTest.cpp
	EvaluateAsRValueTest.cpp			EvaluateAsRValueTest.cpp
	ExternalASTSourceTest.cpp			ExternalASTSourceTest.cpp
	NamedDeclPrinterTest.cpp			NamedDeclPrinterTest.cpp
	SourceLocationTest.cpp			SourceLocationTest.cpp
	StmtPrinterTest.cpp			StmtPrinterTest.cpp
	Show All 10 Lines

cfe/trunk/unittests/AST/CommentTextTest.cpp

				//===- unittest/AST/CommentTextTest.cpp - Comment text extraction test ----===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// Tests for user-friendly output formatting of comments, i.e.
				// RawComment::getFormattedText().
				//
				//===----------------------------------------------------------------------===//

				#include "clang/AST/RawCommentList.h"
				#include "clang/Basic/CommentOptions.h"
				#include "clang/Basic/Diagnostic.h"
				#include "clang/Basic/DiagnosticIDs.h"
				#include "clang/Basic/FileManager.h"
				#include "clang/Basic/FileSystemOptions.h"
				#include "clang/Basic/SourceLocation.h"
				#include "clang/Basic/SourceManager.h"
				#include "clang/Basic/VirtualFileSystem.h"
				#include "llvm/Support/MemoryBuffer.h"
				#include <gtest/gtest.h>

				namespace clang {

				class CommentTextTest : public ::testing::Test {
				protected:
				std::string formatComment(llvm::StringRef CommentText) {
				SourceManagerForFile FileSourceMgr("comment-test.cpp", CommentText);
				SourceManager& SourceMgr = FileSourceMgr.get();

				auto CommentStartOffset = CommentText.find("/");
				assert(CommentStartOffset != llvm::StringRef::npos);
				FileID File = SourceMgr.getMainFileID();

				SourceRange CommentRange(
				SourceMgr.getLocForStartOfFile(File).getLocWithOffset(
				CommentStartOffset),
				SourceMgr.getLocForEndOfFile(File));
				CommentOptions EmptyOpts;
				// FIXME: technically, merged that we set here is incorrect, but that
				// shouldn't matter.
				RawComment Comment(SourceMgr, CommentRange, EmptyOpts, /Merged=/true);
				DiagnosticsEngine Diags(new DiagnosticIDs, new DiagnosticOptions);
				return Comment.getFormattedText(SourceMgr, Diags);
				}
				};

				TEST_F(CommentTextTest, FormattedText) {
				// clang-format off
				auto ExpectedOutput =
				R"(This function does this and that.
				For example,
				Runnning it in that case will give you
				this result.
				That's about it.)";
				// Two-slash comments.
				EXPECT_EQ(ExpectedOutput, formatComment(
				R"cpp(
				// This function does this and that.
				// For example,
				// Runnning it in that case will give you
				// this result.
				// That's about it.)cpp"));

				// Three-slash comments.
				EXPECT_EQ(ExpectedOutput, formatComment(
				R"cpp(
				/// This function does this and that.
				/// For example,
				/// Runnning it in that case will give you
				/// this result.
				/// That's about it.)cpp"));

				// Block comments.
				EXPECT_EQ(ExpectedOutput, formatComment(
				R"cpp(
				/* This function does this and that.
				* For example,
				* Runnning it in that case will give you
				* this result.
				* That's about it.*/)cpp"));

				// Doxygen-style block comments.
				EXPECT_EQ(ExpectedOutput, formatComment(
				R"cpp(
				/** This function does this and that.
				* For example,
				* Runnning it in that case will give you
				* this result.
				* That's about it.*/)cpp"));

				// Weird indentation.
				EXPECT_EQ(ExpectedOutput, formatComment(
				R"cpp(
				// This function does this and that.
				// For example,
				// Runnning it in that case will give you
				// this result.
				// That's about it.)cpp"));
				// clang-format on
				}

				TEST_F(CommentTextTest, KeepsDoxygenControlSeqs) {
				// clang-format off
				auto ExpectedOutput =
				R"(\brief This is the brief part of the comment.
				\param a something about a.
				@param b something about b.)";

				EXPECT_EQ(ExpectedOutput, formatComment(
				R"cpp(
				/// \brief This is the brief part of the comment.
				/// \param a something about a.
				/// @param b something about b.)cpp"));
				// clang-format on
				}

				} // namespace clang

This is an archive of the discontinued LLVM Phabricator instance.

[AST] Added a helper to extract a user-friendly text of a comment.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 147062

cfe/trunk/include/clang/AST/CommentLexer.h

cfe/trunk/include/clang/AST/RawCommentList.h

cfe/trunk/lib/AST/CommentLexer.cpp

cfe/trunk/lib/AST/RawCommentList.cpp

cfe/trunk/unittests/AST/CMakeLists.txt

cfe/trunk/unittests/AST/CommentTextTest.cpp

[AST] Added a helper to extract a user-friendly text of a comment.
ClosedPublic