This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Format/
-
Format/
2
TokenAnnotator.h
1/6
TokenAnnotator.cpp
-
UnwrappedLineFormatter.cpp
-
unittests/Format/
-
Format/
1/3
FormatTest.cpp

Differential D42036

[clang-format] Keep comments aligned to macros
Needs ReviewPublic

Authored by mzeren-vmw on Jan 14 2018, 7:47 AM.

Download Raw Diff

Details

Reviewers

krasimir
klimek

Summary

r312125, which introduced preprocessor indentation, shipped with a known
issue where "indentation of comments immediately before indented
preprocessor lines is toggled on each run". For example these two forms
toggle:

#ifndef HEADER_H
#define HEADER_H
#if 1
// comment
#   define A 0
#endif
#endif

#ifndef HEADER_H
#define HEADER_H
#if 1
   // comment
#   define A 0
#endif
#endif

This happens because we check vertical alignment against the "#" yet
indent to the level of the "define". This patch resolves this issue by
checking vertical alignment against the "define", and by tracking a
"LevelOffset" (0 or 1) in each AnnotatedLine to account for the
off-by-one indentation of preprocessor lines.

Diff Detail

Repository

rC Clang

Build Status

Buildable 13818
Build 13818: arc lint + arc unit

Event Timeline

mzeren-vmw created this revision.Jan 14 2018, 7:47 AM

Herald added a subscriber: cfe-commits. · View Herald TranscriptJan 14 2018, 7:47 AM

Harbormaster completed remote builds in B13818: Diff 129777.Jan 14 2018, 7:49 AM

Just from a formatting point of view, why not:

//.   Comment
#.    define X

In D42036#976827, @klimek wrote:
Just from a formatting point of view, why not:
//.   Comment
#.    define X

(assuming the '.'s are unintentional)
There is some logic in placing // in column 0, since we place # in column 0. However, we do not have examples of that style in our code base. We do have examples of aligning // above define:

   // Comment
#  define

Are you suggesting that c-f manage the indent after the // (or ///, etc.)? This seems more complex than managing the space between # and <directive>. I do want c-f to be able to re-indent aligned comments if an #if is inserted above.

krasimir added inline comments.Jan 18 2018, 7:24 AM

lib/Format/TokenAnnotator.cpp
1710	Please comment these.
1756	This feels a bit awkward: we're adding code that implicitly assumes the exact style the preprocessor directives and comments around them are handled. Maybe if this could become part of the level itself, it would feel less awkward.
lib/Format/TokenAnnotator.h
41	Is there a way to not introduce `LevelOffset`, but have it part of `Level`?
unittests/Format/FormatTest.cpp
2619	I would like to see test including multiline `//`-comment sections before, inside and after preprocessor directives as well as `/**/`-style comments.

Documented CommentAlignment enumerators. Documenting them suggested better
enumerator names.

Added tests for multi-line comments, block comments and trailing comments.

mzeren-vmw removed a reviewer: euhlmann.Jan 18 2018, 3:21 PM

mzeren-vmw marked 2 inline comments as done.

mzeren-vmw added inline comments.

lib/Format/TokenAnnotator.cpp
1756	I agree that the "long distance coupling" is awkward. Perhaps the new enumerator names make this a bit more palatable?
lib/Format/TokenAnnotator.h
41	`Level` is an abstract indentation level, while `LevelOffset` is "columns". They have different units. Maybe it would be feasible to change the units of "Level" to columns in order to merge these two variables, but doing so would throw away information. It also seems like a much larger change. We could create a composite type `class AnnotatedLevel { private: unsigned Level, unsigned Offset public: <strongly typed operations> }` but that seems over-engineered. Any other ideas?

krasimir added inline comments.Jan 22 2018, 9:11 AM

lib/Format/TokenAnnotator.cpp
1725	Why are we checking `NextNonComment.Level > 0` here? We could be aligned with the next preprocessor directive even at level 0.
1729	I think this should be enabled only if preprocessor indentation has been enabled.
1756	Could you add a comment that this `1` comes from the `#` in the preprocessor directive?
unittests/Format/FormatTest.cpp
2629	I would expect this comment to be aligned with the `#endif` on the next line.
2657	Same here.

While I agree that there is probably a bug to fix, I don't (yet) agree with what is proposed in this patch. I think a comment in between preprocessor directives should always either:

Be considered part of the code in between the #-lines
Be considered to be commenting on the subsequent #-line

In the former case, we need to indent with the regular IndentWidth, completely irrespective of anything inside the preprocessor lines. In the latter case, we should align with the # in column 0. To me, aligning with the define seems fundamentally wrong.

In D42036#984401, @djasper wrote:

To me, aligning with the define seems fundamentally wrong.

we definitely have code that does that internally. It can also be seen in the wild e.g.:
https://github.com/boostorg/config/blob/develop/include/boost/config/detail/posix_features.hpp
However, it seems reasonable that clang-format's "default" be alignment with #. That will be a simpler patch, and it will resolve the toggling behavior. Let me work that up in a separate review.

Revision Contents

Path

Size

lib/

Format/

TokenAnnotator.h

8 lines

TokenAnnotator.cpp

38 lines

UnwrappedLineFormatter.cpp

1 line

unittests/

Format/

FormatTest.cpp

44 lines

Diff 129777

lib/Format/TokenAnnotator.h

Show All 32 Lines	enum LineType {
LT_Other,		LT_Other,
LT_PreprocessorDirective,		LT_PreprocessorDirective,
LT_VirtualFunctionDecl		LT_VirtualFunctionDecl
};		};

class AnnotatedLine {		class AnnotatedLine {
public:		public:
AnnotatedLine(const UnwrappedLine &Line)		AnnotatedLine(const UnwrappedLine &Line)
: First(Line.Tokens.front().Tok), Level(Line.Level),		: First(Line.Tokens.front().Tok), Level(Line.Level), LevelOffset(0),
		krasimirUnsubmitted Not Done Reply Inline Actions Is there a way to not introduce `LevelOffset`, but have it part of `Level`? krasimir: Is there a way to not introduce `LevelOffset`, but have it part of `Level`?
		mzeren-vmwAuthorUnsubmitted Not Done Reply Inline Actions `Level` is an abstract indentation level, while `LevelOffset` is "columns". They have different units. Maybe it would be feasible to change the units of "Level" to columns in order to merge these two variables, but doing so would throw away information. It also seems like a much larger change. We could create a composite type `class AnnotatedLevel { private: unsigned Level, unsigned Offset public: <strongly typed operations> }` but that seems over-engineered. Any other ideas? mzeren-vmw: `Level` is an abstract indentation level, while `LevelOffset` is "columns". They have different…
MatchingOpeningBlockLineIndex(Line.MatchingOpeningBlockLineIndex),		MatchingOpeningBlockLineIndex(Line.MatchingOpeningBlockLineIndex),
InPPDirective(Line.InPPDirective),		InPPDirective(Line.InPPDirective),
MustBeDeclaration(Line.MustBeDeclaration), MightBeFunctionDecl(false),		MustBeDeclaration(Line.MustBeDeclaration), MightBeFunctionDecl(false),
IsMultiVariableDeclStmt(false), Affected(false),		IsMultiVariableDeclStmt(false), Affected(false),
LeadingEmptyLinesAffected(false), ChildrenAffected(false),		LeadingEmptyLinesAffected(false), ChildrenAffected(false),
FirstStartColumn(Line.FirstStartColumn) {		FirstStartColumn(Line.FirstStartColumn) {
assert(!Line.Tokens.empty());		assert(!Line.Tokens.empty());

▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	public:

FormatToken *First;		FormatToken *First;
FormatToken *Last;		FormatToken *Last;

SmallVector<AnnotatedLine *, 0> Children;		SmallVector<AnnotatedLine *, 0> Children;

LineType Type;		LineType Type;
unsigned Level;		unsigned Level;

		/// Adjustment to Level based indent. When comments are aligned to the next
		/// preprocessor line they must use the same offset as the directive,
		/// typically 1 due to the leading #.
		unsigned LevelOffset;

size_t MatchingOpeningBlockLineIndex;		size_t MatchingOpeningBlockLineIndex;
bool InPPDirective;		bool InPPDirective;
bool MustBeDeclaration;		bool MustBeDeclaration;
bool MightBeFunctionDecl;		bool MightBeFunctionDecl;
bool IsMultiVariableDeclStmt;		bool IsMultiVariableDeclStmt;

/// \c True if this line should be formatted, i.e. intersects directly or		/// \c True if this line should be formatted, i.e. intersects directly or
/// indirectly with one of the input ranges.		/// indirectly with one of the input ranges.
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

lib/Format/TokenAnnotator.cpp

Show First 20 Lines • Show All 1,701 Lines • ▼ Show 20 Lines	while (Current &&
Current = Current->Next;		Current = Current->Next;
}		}

const FormatStyle &Style;		const FormatStyle &Style;
const AdditionalKeywords &Keywords;		const AdditionalKeywords &Keywords;
FormatToken *Current;		FormatToken *Current;
};		};

		enum CommentAlignment { CA_None, CA_Code, CA_Preprocessor };
		krasimirUnsubmitted Done Reply Inline Actions Please comment these. krasimir: Please comment these.

		CommentAlignment getCommentAlignment(const AnnotatedLine &Comment,
		const AnnotatedLine &NextNonComment) {
		if (NextNonComment.First->NewlinesBefore > 1)
		return CA_None;
		// If the next line is an indented preprocessor line look at the directive,
		// not the #.
		if ((NextNonComment.Type == LT_PreprocessorDirective \|\|
		NextNonComment.Type == LT_ImportStatement) &&
		NextNonComment.Level > 0 && !Comment.InPPDirective)
		return NextNonComment.First->Next &&
		(Comment.First->OriginalColumn ==
		NextNonComment.First->Next->OriginalColumn)
		? CA_Preprocessor
		: CA_None;
		krasimirUnsubmitted Not Done Reply Inline Actions Why are we checking `NextNonComment.Level > 0` here? We could be aligned with the next preprocessor directive even at level 0. krasimir: Why are we checking `NextNonComment.Level > 0` here? We could be aligned with the next…
		else
		return Comment.First->OriginalColumn == NextNonComment.First->OriginalColumn
		? CA_Code
		: CA_None;
		krasimirUnsubmitted Not Done Reply Inline Actions I think this should be enabled only if preprocessor indentation has been enabled. krasimir: I think this should be enabled only if preprocessor indentation has been enabled.
		}

} // end anonymous namespace		} // end anonymous namespace

void TokenAnnotator::setCommentLineLevels(		void TokenAnnotator::setCommentLineLevels(
SmallVectorImpl<AnnotatedLine *> &Lines) {		SmallVectorImpl<AnnotatedLine *> &Lines) {
const AnnotatedLine *NextNonCommentLine = nullptr;		const AnnotatedLine *NextNonCommentLine = nullptr;
for (SmallVectorImpl<AnnotatedLine *>::reverse_iterator I = Lines.rbegin(),		for (SmallVectorImpl<AnnotatedLine *>::reverse_iterator I = Lines.rbegin(),
E = Lines.rend();		E = Lines.rend();
I != E; ++I) {		I != E; ++I) {
bool CommentLine = true;		bool CommentLine = true;
for (const FormatToken Tok = (I)->First; Tok; Tok = Tok->Next) {		for (const FormatToken Tok = (I)->First; Tok; Tok = Tok->Next) {
if (!Tok->is(tok::comment)) {		if (!Tok->is(tok::comment)) {
CommentLine = false;		CommentLine = false;
break;		break;
}		}
}		}

if (NextNonCommentLine && CommentLine) {		// If the comment is aligned with the line immediately following it, that's
// If the comment is currently aligned with the line immediately following		// probably intentional and we should keep it.
// it, that's probably intentional and we should keep it.		if (CommentLine && NextNonCommentLine) {
bool AlignedWithNextLine =		CommentAlignment Alignment =
NextNonCommentLine->First->NewlinesBefore <= 1 &&		getCommentAlignment(*I, NextNonCommentLine);
NextNonCommentLine->First->OriginalColumn ==		if (Alignment != CA_None)
(*I)->First->OriginalColumn;
if (AlignedWithNextLine)
(*I)->Level = NextNonCommentLine->Level;		(*I)->Level = NextNonCommentLine->Level;
		if (Alignment == CA_Preprocessor)
		(*I)->LevelOffset = 1;
		krasimirUnsubmitted Not Done Reply Inline Actions This feels a bit awkward: we're adding code that implicitly assumes the exact style the preprocessor directives and comments around them are handled. Maybe if this could become part of the level itself, it would feel less awkward. krasimir: This feels a bit awkward: we're adding code that implicitly assumes the exact style the…
		mzeren-vmwAuthorUnsubmitted Not Done Reply Inline Actions I agree that the "long distance coupling" is awkward. Perhaps the new enumerator names make this a bit more palatable? mzeren-vmw: I agree that the "long distance coupling" is awkward. Perhaps the new enumerator names make…
		krasimirUnsubmitted Not Done Reply Inline Actions Could you add a comment that this `1` comes from the `#` in the preprocessor directive? krasimir: Could you add a comment that this `1` comes from the `#` in the preprocessor directive?
} else {		} else {
NextNonCommentLine = (I)->First->isNot(tok::r_brace) ? (I) : nullptr;		NextNonCommentLine = (I)->First->isNot(tok::r_brace) ? (I) : nullptr;
}		}

setCommentLineLevels((*I)->Children);		setCommentLineLevels((*I)->Children);
}		}
}		}

▲ Show 20 Lines • Show All 1,168 Lines • Show Last 20 Lines

lib/Format/UnwrappedLineFormatter.cpp

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	public:

/// \brief Returns the indent for the current line.		/// \brief Returns the indent for the current line.
unsigned getIndent() const { return Indent; }		unsigned getIndent() const { return Indent; }

/// \brief Update the indent state given that \p Line is going to be formatted		/// \brief Update the indent state given that \p Line is going to be formatted
/// next.		/// next.
void nextLine(const AnnotatedLine &Line) {		void nextLine(const AnnotatedLine &Line) {
Offset = getIndentOffset(*Line.First);		Offset = getIndentOffset(*Line.First);
		Offset += Line.LevelOffset;
// Update the indent level cache size so that we can rely on it		// Update the indent level cache size so that we can rely on it
// having the right size in adjustToUnmodifiedline.		// having the right size in adjustToUnmodifiedline.
while (IndentForLevel.size() <= Line.Level)		while (IndentForLevel.size() <= Line.Level)
IndentForLevel.push_back(-1);		IndentForLevel.push_back(-1);
if (Line.InPPDirective) {		if (Line.InPPDirective) {
Indent = Line.Level * Style.IndentWidth + AdditionalIndent;		Indent = Line.Level * Style.IndentWidth + AdditionalIndent;
} else {		} else {
IndentForLevel.resize(Line.Level + 1);		IndentForLevel.resize(Line.Level + 1);
▲ Show 20 Lines • Show All 1,118 Lines • Show Last 20 Lines

unittests/Format/FormatTest.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,491 Lines • ▼ Show 20 Lines	verifyFormat("#ifndef HEADER_H\n"
Style);		Style);
// Include guards must have a #define with the same variable immediately		// Include guards must have a #define with the same variable immediately
// after #ifndef.		// after #ifndef.
verifyFormat("#ifndef NOT_GUARD\n"		verifyFormat("#ifndef NOT_GUARD\n"
"# define FOO\n"		"# define FOO\n"
"code();\n"		"code();\n"
"#endif",		"#endif",
Style);		Style);

// Include guards must cover the entire file.		// Include guards must cover the entire file.
verifyFormat("code();\n"		verifyFormat("code();\n"
"code();\n"		"code();\n"
"#ifndef NOT_GUARD\n"		"#ifndef NOT_GUARD\n"
"# define NOT_GUARD\n"		"# define NOT_GUARD\n"
"code();\n"		"code();\n"
"#endif",		"#endif",
Style);		Style);
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	EXPECT_EQ("code();\n"
"code();\n"		"code();\n"
"#endif",		"#endif",
format("code();\n"		format("code();\n"
"#ifndef HEADER_H\n"		"#ifndef HEADER_H\n"
"# define HEADER_H\n"		"# define HEADER_H\n"
"code();\n"		"code();\n"
"#endif",		"#endif",
Style));		Style));
// FIXME: The comment indent corrector in TokenAnnotator gets thrown off by		// Comments aligned to macros stay aligned. This test is incompatible with
// preprocessor indentation.		// verifyFormat() because messUp() removes the alignment.
EXPECT_EQ("#if 1\n"		{
" // comment\n"		const char *Expected = "// Level 0 unaligned comment\n"
		"#ifndef HEADER_H\n"
		"// Level 0 aligned comment\n"
		"#define HEADER_H\n"
		"#if 1\n"
		" // aligned comment\n"
"# define A 0\n"		"# define A 0\n"
"// comment\n"		"// un-aligned comment\n"
"# define B 0\n"		"# define B 0\n"
"#endif",		"#endif\n"
format("#if 1\n"		"#endif";
"// comment\n"		const char *ToFormat = " // Level 0 unaligned comment\n"
		"#ifndef HEADER_H\n"
		"// Level 0 aligned comment\n"
		"#define HEADER_H\n"
		"#if 1\n"
		" // aligned comment\n"
"# define A 0\n"		"# define A 0\n"
" // comment\n"		" // un-aligned comment\n"
"# define B 0\n"		"# define B 0\n"
"#endif",		"#endif\n"
Style));		"#endif";
		krasimirUnsubmitted Done Reply Inline Actions I would like to see test including multiline `//`-comment sections before, inside and after preprocessor directives as well as `//`-style comments. krasimir:** I would like to see test including multiline `//`-comment sections before, inside and after…
		EXPECT_EQ(Expected, format(ToFormat, Style));
		EXPECT_EQ(Expected, format(Expected, Style));
		}
// Test with tabs.		// Test with tabs.
Style.UseTab = FormatStyle::UT_Always;		Style.UseTab = FormatStyle::UT_Always;
Style.IndentWidth = 8;		Style.IndentWidth = 8;
Style.TabWidth = 8;		Style.TabWidth = 8;
verifyFormat("#ifdef _WIN32\n"		verifyFormat("#ifdef _WIN32\n"
"#\tdefine A 0\n"		"#\tdefine A 0\n"
"#\tifdef VAR2\n"		"#\tifdef VAR2\n"
		krasimirUnsubmitted Not Done Reply Inline Actions I would expect this comment to be aligned with the `#endif` on the next line. krasimir: I would expect this comment to be aligned with the `#endif` on the next line.
"#\t\tdefine B 1\n"		"#\t\tdefine B 1\n"
"#\t\tinclude <someheader.h>\n"		"#\t\tinclude <someheader.h>\n"
"#\t\tdefine MACRO \\\n"		"#\t\tdefine MACRO \\\n"
"\t\t\tsome_very_long_func_aaaaaaaaaa();\n"		"\t\t\tsome_very_long_func_aaaaaaaaaa();\n"
"#\tendif\n"		"#\tendif\n"
"#else\n"		"#else\n"
"#\tdefine A 1\n"		"#\tdefine A 1\n"
"#endif",		"#endif",
Show All 11 Lines

TEST_F(FormatTest, FormatHashIfNotAtStartOfLine) {		TEST_F(FormatTest, FormatHashIfNotAtStartOfLine) {
verifyFormat("{\n { a #c; }\n}");		verifyFormat("{\n { a #c; }\n}");
}		}

TEST_F(FormatTest, FormatUnbalancedStructuralElements) {		TEST_F(FormatTest, FormatUnbalancedStructuralElements) {
EXPECT_EQ("#define A \\\n { \\\n {\nint i;",		EXPECT_EQ("#define A \\\n { \\\n {\nint i;",
format("#define A { {\nint i;", getLLVMStyleWithColumns(11)));		format("#define A { {\nint i;", getLLVMStyleWithColumns(11)));
EXPECT_EQ("#define A \\\n } \\\n }\nint i;",		EXPECT_EQ("#define A \\\n } \\\n }\nint i;",
		krasimirUnsubmitted Not Done Reply Inline Actions Same here. krasimir: Same here.
format("#define A } }\nint i;", getLLVMStyleWithColumns(11)));		format("#define A } }\nint i;", getLLVMStyleWithColumns(11)));
}		}

TEST_F(FormatTest, EscapedNewlines) {		TEST_F(FormatTest, EscapedNewlines) {
FormatStyle Narrow = getLLVMStyleWithColumns(11);		FormatStyle Narrow = getLLVMStyleWithColumns(11);
EXPECT_EQ("#define A \\\n int i; \\\n int j;",		EXPECT_EQ("#define A \\\n int i; \\\n int j;",
format("#define A \\\nint i;\\\n int j;", Narrow));		format("#define A \\\nint i;\\\n int j;", Narrow));
EXPECT_EQ("#define A\n\nint i;", format("#define A \\\n\n int i;"));		EXPECT_EQ("#define A\n\nint i;", format("#define A \\\n\n int i;"));
▲ Show 20 Lines • Show All 9,202 Lines • Show Last 20 Lines