This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
cfe/trunk/
-
trunk/
-
lib/Format/
-
Format/
-
BreakableToken.cpp
-
unittests/Format/
-
Format/
-
FormatTestComments.cpp

Differential D33285

clang-format: do not reflow bullet lists
ClosedPublic

Authored by Typz on May 17 2017, 8:59 AM.

Download Raw Diff

Details

Reviewers

krasimir

Commits

rGa881be87ca7e: clang-format: do not reflow bullet lists
rC303556: clang-format: do not reflow bullet lists
rL303556: clang-format: do not reflow bullet lists

Summary

This patch prevents reflowing bullet lists in block comments.

It handles all lists supported by doxygen and markdown, e.g. bullet
lists starting with '-', '*', '+', as well as numbered lists starting
with -# or a number followed by a dot.

Diff Detail

Repository: rL LLVM

Event Timeline

Typz created this revision.May 17 2017, 8:59 AM

Herald added a subscriber: klimek. · View Herald TranscriptMay 17 2017, 9:00 AM

djasper removed a reviewer: djasper.May 17 2017, 1:38 PM

djasper added a subscriber: djasper.

krasimir added inline comments.May 18 2017, 12:26 AM

lib/Format/BreakableToken.cpp
313 ↗	(On Diff #99312)	A problem with this is that sometimes you have a sentence ending with a number, like this one, in 2016. If this sentence also happens to just go over the column width, its last part would be reflown and during subsequent passes it will be seen as a numbered list, which is super unfortunate. I'd like us to come up with a more refined strategy of handling this case. Maybe we should look at how others are doing it?
315 ↗	(On Diff #99312)	This builds an `llvm::Regex` on each invocation, which is wasteful.
unittests/Format/FormatTestComments.cpp
1663 ↗	(On Diff #99312)	I'd also like to see tests where we correctly reflow lists with multiline entries.

Typz marked an inline comment as done.May 18 2017, 5:53 AM

Typz added inline comments.

lib/Format/BreakableToken.cpp
313 ↗	(On Diff #99312)	Looking at doxygen, it seems there is no extra protection: just a number followed by a dot... So it means: We should never break before a such a sequence, to avoid the issue. We may also limit the expression to limit the size of the number: I am not sure there are cases where bullet lists with hundreds of items are used, esp. with explicit values (uses the auto-numbering -# would be much simpler in that case). Maybe a limit of 2 digits? The same limit would be applied to prevent breaking before a number followed by a dot. What do you think?
315 ↗	(On Diff #99312)	I did this to keep the function re-entrant; but since the code is not multi-thread I can use a static variable instead.

Typz marked 2 inline comments as done.May 18 2017, 5:54 AM

Use static regex to avoid recreating it each time
Add more tests

krasimir added inline comments.May 18 2017, 6:39 AM

lib/Format/BreakableToken.cpp
313 ↗	(On Diff #99312)	I like the combination of the two options: let's limit to 2 digits and not break before a matching numbered list sequence followed by a fullstop. That would require also a little change to `BreakableToken::getCommentSplit`.

Typz marked 3 inline comments as done.May 18 2017, 7:58 AM

Typz added inline comments.

lib/Format/BreakableToken.cpp
313 ↗	(On Diff #99312)	Done, but I could find a use-case where this would break subsequent passes, apart from re-running clang-format ; but in this case it is fine, since the comments are already formatted to fit, and will thus not be reflowed...

Limit to 2 digits and not break before a matching numbered list sequence followed by a fullstop, to avoid interpreting numbers at the end of sentence as numbered bullets (and thus preventing reflow).

Harbormaster completed remote builds in B6544: Diff 99436.May 18 2017, 8:01 AM

Looks great!

This revision is now accepted and ready to land.May 22 2017, 5:39 AM

Closed by commit rL303556: clang-format: do not reflow bullet lists (authored by Typz). · Explain WhyMay 22 2017, 7:47 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

cfe/

trunk/

lib/

Format/

BreakableToken.cpp

21 lines

unittests/

Format/

FormatTestComments.cpp

67 lines

Diff 99764

cfe/trunk/lib/Format/BreakableToken.cpp

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	unsigned BytesInChar =
encoding::getCodePointNumBytes(Text[MaxSplitBytes], Encoding);		encoding::getCodePointNumBytes(Text[MaxSplitBytes], Encoding);
NumChars +=		NumChars +=
encoding::columnWidthWithTabs(Text.substr(MaxSplitBytes, BytesInChar),		encoding::columnWidthWithTabs(Text.substr(MaxSplitBytes, BytesInChar),
ContentStartColumn, TabWidth, Encoding);		ContentStartColumn, TabWidth, Encoding);
MaxSplitBytes += BytesInChar;		MaxSplitBytes += BytesInChar;
}		}

StringRef::size_type SpaceOffset = Text.find_last_of(Blanks, MaxSplitBytes);		StringRef::size_type SpaceOffset = Text.find_last_of(Blanks, MaxSplitBytes);

		// Do not split before a number followed by a dot: this would be interpreted
		// as a numbered list, which would prevent re-flowing in subsequent passes.
		static llvm::Regex kNumberedListRegexp = llvm::Regex("^[1-9][0-9]?\\.");
		if (SpaceOffset != StringRef::npos &&
		kNumberedListRegexp.match(Text.substr(SpaceOffset).ltrim(Blanks)))
		SpaceOffset = Text.find_last_of(Blanks, SpaceOffset);

if (SpaceOffset == StringRef::npos \|\|		if (SpaceOffset == StringRef::npos \|\|
// Don't break at leading whitespace.		// Don't break at leading whitespace.
Text.find_last_not_of(Blanks, SpaceOffset) == StringRef::npos) {		Text.find_last_not_of(Blanks, SpaceOffset) == StringRef::npos) {
// Make sure that we don't break at leading whitespace that		// Make sure that we don't break at leading whitespace that
// reaches past MaxSplit.		// reaches past MaxSplit.
StringRef::size_type FirstNonWhitespace = Text.find_first_not_of(Blanks);		StringRef::size_type FirstNonWhitespace = Text.find_first_not_of(Blanks);
if (FirstNonWhitespace == StringRef::npos)		if (FirstNonWhitespace == StringRef::npos)
// If the comment is only whitespace, we cannot split.		// If the comment is only whitespace, we cannot split.
▲ Show 20 Lines • Show All 205 Lines • ▼ Show 20 Lines

const FormatToken &BreakableComment::tokenAt(unsigned LineIndex) const {		const FormatToken &BreakableComment::tokenAt(unsigned LineIndex) const {
return Tokens[LineIndex] ? *Tokens[LineIndex] : Tok;		return Tokens[LineIndex] ? *Tokens[LineIndex] : Tok;
}		}

static bool mayReflowContent(StringRef Content) {		static bool mayReflowContent(StringRef Content) {
Content = Content.trim(Blanks);		Content = Content.trim(Blanks);
// Lines starting with '@' commonly have special meaning.		// Lines starting with '@' commonly have special meaning.
static const SmallVector<StringRef, 4> kSpecialMeaningPrefixes = {		// Lines starting with '-', '-#', '+' or '*' are bulleted/numbered lists.
"@", "TODO", "FIXME", "XXX"};		static const SmallVector<StringRef, 8> kSpecialMeaningPrefixes = {
		"@", "TODO", "FIXME", "XXX", "-# ", "- ", "+ ", "* " };
bool hasSpecialMeaningPrefix = false;		bool hasSpecialMeaningPrefix = false;
for (StringRef Prefix : kSpecialMeaningPrefixes) {		for (StringRef Prefix : kSpecialMeaningPrefixes) {
if (Content.startswith(Prefix)) {		if (Content.startswith(Prefix)) {
hasSpecialMeaningPrefix = true;		hasSpecialMeaningPrefix = true;
break;		break;
}		}
}		}

		// Numbered lists may also start with a number followed by '.'
		// To avoid issues if a line starts with a number which is actually the end
		// of a previous line, we only consider numbers with up to 2 digits.
		static llvm::Regex kNumberedListRegexp = llvm::Regex("^[1-9][0-9]?\\. ");
		hasSpecialMeaningPrefix = hasSpecialMeaningPrefix \|\|
		kNumberedListRegexp.match(Content);

// Simple heuristic for what to reflow: content should contain at least two		// Simple heuristic for what to reflow: content should contain at least two
// characters and either the first or second character must be		// characters and either the first or second character must be
// non-punctuation.		// non-punctuation.
return Content.size() >= 2 && !hasSpecialMeaningPrefix &&		return Content.size() >= 2 && !hasSpecialMeaningPrefix &&
!Content.endswith("\\") &&		!Content.endswith("\\") &&
// Note that this is UTF-8 safe, since if isPunctuation(Content[0]) is		// Note that this is UTF-8 safe, since if isPunctuation(Content[0]) is
// true, then the first code point must be 1 byte long.		// true, then the first code point must be 1 byte long.
(!isPunctuation(Content[0]) \|\| !isPunctuation(Content[1]));		(!isPunctuation(Content[0]) \|\| !isPunctuation(Content[1]));
▲ Show 20 Lines • Show All 587 Lines • Show Last 20 Lines

cfe/trunk/unittests/Format/FormatTestComments.cpp

Show First 20 Lines • Show All 1,571 Lines • ▼ Show 20 Lines	TEST_F(FormatTestComments, ReflowsComments) {
EXPECT_EQ("/* long long long\n"		EXPECT_EQ("/* long long long\n"
" * long\n"		" * long\n"
" *\n"		" *\n"
" * long */",		" * long */",
format("/* long long long long\n"		format("/* long long long long\n"
" *\n"		" *\n"
" * long */",		" * long */",
getLLVMStyleWithColumns(20)));		getLLVMStyleWithColumns(20)));

// Don't reflow lines having content that is a single character.		// Don't reflow lines having content that is a single character.
EXPECT_EQ("// long long long\n"		EXPECT_EQ("// long long long\n"
"// long\n"		"// long\n"
"// l",		"// l",
format("// long long long long\n"		format("// long long long long\n"
"// l",		"// l",
getLLVMStyleWithColumns(20)));		getLLVMStyleWithColumns(20)));

// Don't reflow lines starting with two punctuation characters.		// Don't reflow lines starting with two punctuation characters.
EXPECT_EQ("// long long long\n"		EXPECT_EQ("// long long long\n"
"// long\n"		"// long\n"
"// ... --- ...",		"// ... --- ...",
format(		format(
"// long long long long\n"		"// long long long long\n"
"// ... --- ...",		"// ... --- ...",
getLLVMStyleWithColumns(20)));		getLLVMStyleWithColumns(20)));

// Don't reflow lines starting with '@'.		// Don't reflow lines starting with '@'.
EXPECT_EQ("// long long long\n"		EXPECT_EQ("// long long long\n"
"// long\n"		"// long\n"
"// @param arg",		"// @param arg",
format("// long long long long\n"		format("// long long long long\n"
"// @param arg",		"// @param arg",
getLLVMStyleWithColumns(20)));		getLLVMStyleWithColumns(20)));

// Don't reflow lines starting with 'TODO'.		// Don't reflow lines starting with 'TODO'.
EXPECT_EQ("// long long long\n"		EXPECT_EQ("// long long long\n"
"// long\n"		"// long\n"
"// TODO: long",		"// TODO: long",
format("// long long long long\n"		format("// long long long long\n"
"// TODO: long",		"// TODO: long",
getLLVMStyleWithColumns(20)));		getLLVMStyleWithColumns(20)));

▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	TEST_F(FormatTestComments, ReflowsComments) {
// Don't reflow lines having different indentation.		// Don't reflow lines having different indentation.
EXPECT_EQ("// long long long\n"		EXPECT_EQ("// long long long\n"
"// long\n"		"// long\n"
"// long",		"// long",
format("// long long long long\n"		format("// long long long long\n"
"// long",		"// long",
getLLVMStyleWithColumns(20)));		getLLVMStyleWithColumns(20)));

		// Don't reflow separate bullets in list
		EXPECT_EQ("// - long long long\n"
		"// long\n"
		"// - long",
		format("// - long long long long\n"
		"// - long",
		getLLVMStyleWithColumns(20)));
		EXPECT_EQ("// * long long long\n"
		"// long\n"
		"// * long",
		format("// * long long long long\n"
		"// * long",
		getLLVMStyleWithColumns(20)));
		EXPECT_EQ("// + long long long\n"
		"// long\n"
		"// + long",
		format("// + long long long long\n"
		"// + long",
		getLLVMStyleWithColumns(20)));
		EXPECT_EQ("// 1. long long long\n"
		"// long\n"
		"// 2. long",
		format("// 1. long long long long\n"
		"// 2. long",
		getLLVMStyleWithColumns(20)));
		EXPECT_EQ("// -# long long long\n"
		"// long\n"
		"// -# long",
		format("// -# long long long long\n"
		"// -# long",
		getLLVMStyleWithColumns(20)));

		EXPECT_EQ("// - long long long\n"
		"// long long long\n"
		"// - long",
		format("// - long long long long\n"
		"// long long\n"
		"// - long",
		getLLVMStyleWithColumns(20)));
		EXPECT_EQ("// - long long long\n"
		"// long long long\n"
		"// long\n"
		"// - long",
		format("// - long long long long\n"
		"// long long long\n"
		"// - long",
		getLLVMStyleWithColumns(20)));

		// Large number (>2 digits) are not list items
		EXPECT_EQ("// long long long\n"
		"// long 1024. long.",
		format("// long long long long\n"
		"// 1024. long.",
		getLLVMStyleWithColumns(20)));

		// Do not break before number, to avoid introducing a non-reflowable doxygen
		// list item.
		EXPECT_EQ("// long long\n"
		"// long 10. long.",
		format("// long long long 10.\n"
		"// long.",
		getLLVMStyleWithColumns(20)));

// Don't break or reflow after implicit string literals.		// Don't break or reflow after implicit string literals.
verifyFormat("#include <t> // l l l\n"		verifyFormat("#include <t> // l l l\n"
" // l",		" // l",
getLLVMStyleWithColumns(20));		getLLVMStyleWithColumns(20));

// Don't break or reflow comments on import lines.		// Don't break or reflow comments on import lines.
EXPECT_EQ("#include \"t\" /* l l l\n"		EXPECT_EQ("#include \"t\" /* l l l\n"
" * l */",		" * l */",
▲ Show 20 Lines • Show All 735 Lines • Show Last 20 Lines