This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Format/
-
Format/
-
ContinuationIndenter.cpp
-
TokenAnnotator.h
-
TokenAnnotator.cpp
-
UnwrappedLineParser.h
-
UnwrappedLineParser.cpp
-
unittests/Format/
-
Format/
1/2
FormatTest.cpp

Differential D136100

[clang-format] Do not parse certain characters in pragma directives
ClosedPublic

Authored by jhuber6 on Oct 17 2022, 11:36 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
JonChesterfield
ronlieb
MyDeveloperDay
owenpan
HazardyKnusperkeks
curdeius
wanders

Commits

rG037669de8bdf: [clang-format] Do not parse certain characters in pragma directives

Summary

Currently, we parse lines inside of a compiler #pragma the same way we
parse any other line. This is fine for some cases, like separating
expressions and adding proper spacing, but in others it causes some poor
results from miscategorizing some tokens.

For example, the OpenMP offloading uses certain clauses that contain
special characters like map(tofrom : A[0:N]). This will be formatted
poorly as it will be split between lines on the first colon.
Additionally the subscript notation will lead to poor spacing. This can
be seen in the OpenMP tests as the automatic clang formatting with
inevitably ruin the formatting.

For example, the following contrived example will be formatted poorly.

#pragma omp target teams distribute collapse(2) map(to: A[0 : M * K])  \
    map(to: B[0:K * N]) map(tofrom:C[0:M*N]) firstprivate(Alpha) \
    firstprivate(Beta) firstprivate(X) firstprivate(D) firstprivate(Y) \
    firstprivate(E) firstprivate(Z) firstprivate(F)

This results in this when formatted, which is far from ideal.

#pragma omp target teams distribute collapse(2) map(to                         \
                                                    : A [0:M * K])             \
    map(to                                                                     \
        : B [0:K * N]) map(tofrom                                              \
                           : C [0:M * N]) firstprivate(Alpha)                  \
        firstprivate(Beta) firstprivate(X) firstprivate(D) firstprivate(Y)     \
            firstprivate(E) firstprivate(Z) firstprivate(F)

This patch seeks to improve this by adding extra logic where the parsing goes
awry. This is primarily caused by the colon being parsed as an inline-asm
directive and the brackes an objective-C expressions. Also the line gets
indented every single time the line is dropped.

This doesn't implement true parsing handling for OpenMP statements.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.Oct 17 2022, 11:36 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 17 2022, 11:36 AM

jhuber6 requested review of this revision.Oct 17 2022, 11:36 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 17 2022, 11:36 AM

Herald added subscribers: cfe-commits, sstefan1. · View Herald Transcript

Pretty interesting, it looks ok from what I can tell, let the others take a look

MyDeveloperDay added a project: Restricted Project.Oct 17 2022, 2:58 PM

In D136100#3863427, @MyDeveloperDay wrote:

Pretty interesting, it looks ok from what I can tell, let the others take a look

Thanks, I was originally hoping I could avoid adding a new boolean for InPragma by asking something like Line.startswith(tok::pp_pragma) but that didn't seem to work.

HazardyKnusperkeks accepted this revision.Oct 18 2022, 1:45 PM

This revision is now accepted and ready to land.Oct 18 2022, 1:45 PM

This revision was landed with ongoing or failed builds.Oct 18 2022, 2:38 PM

Closed by commit rG037669de8bdf: [clang-format] Do not parse certain characters in pragma directives (authored by jhuber6). · Explain Why

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rG037669de8bdf: [clang-format] Do not parse certain characters in pragma directives.

Chromium is seeing a formatting regression after this: https://github.com/llvm/llvm-project/issues/59473

In D136100#3988424, @hans wrote:

Chromium is seeing a formatting regression after this: https://github.com/llvm/llvm-project/issues/59473

My guess is from this line, we could be more specific on the type of pragma.

if (State.Line->InPragmaDirective)
  return CurrentState.Indent + Style.ContinuationIndentWidth;

In D136100#3988424, @hans wrote:

Chromium is seeing a formatting regression after this: https://github.com/llvm/llvm-project/issues/59473

Can we get more specific about what Chromium is seeing?

Can we get more specific about what Chromium is seeing?

https://github.com/llvm/llvm-project/issues/59473 has a standalone repro. What else would you like to see?

I think you need to parse the first paren as whole and do not indent it.

Should we revert this patch?

clang/unittests/Format/FormatTest.cpp
5175	Why was this test case changed? It seemed to be related to the regression mentioned in D136100#3988574.

I would prefer that this doesn't get reverted, see the summary for the awful results for OpenMP without this patch. A potential solution would be to parse the next token and only add the indent if it's omp.

clang/unittests/Format/FormatTest.cpp
5175	It's definitely related, we want some indentation for successive OpenMP clauses pushed to a new line, see below.

jhuber6 mentioned this in D144884: [clang-format] Only add pragma continuation indentation for 'omp' clauses.Feb 27 2023, 8:39 AM

jhuber6 mentioned this in rG466b4327f8fc: [clang-format] Only add pragma continuation indentation for 'omp' clauses.Feb 28 2023, 1:16 PM

Revision Contents

Path

Size

clang/

lib/

Format/

ContinuationIndenter.cpp

3 lines

TokenAnnotator.h

5 lines

TokenAnnotator.cpp

5 lines

UnwrappedLineParser.h

8 lines

UnwrappedLineParser.cpp

8 lines

unittests/

Format/

FormatTest.cpp

23 lines

Diff 468705

clang/lib/Format/ContinuationIndenter.cpp

Show First 20 Lines • Show All 1,242 Lines • ▼ Show 20 Lines	if (CurrentState.StartOfArraySubscripts != 0) {
return CurrentState.StartOfArraySubscripts;		return CurrentState.StartOfArraySubscripts;
} else if (Style.isCSharp()) { // C# allows `["key"] = value` inside object		} else if (Style.isCSharp()) { // C# allows `["key"] = value` inside object
// initializers.		// initializers.
return CurrentState.Indent;		return CurrentState.Indent;
}		}
return ContinuationIndent;		return ContinuationIndent;
}		}

		if (State.Line->InPragmaDirective)
		return CurrentState.Indent + Style.ContinuationIndentWidth;

// This ensure that we correctly format ObjC methods calls without inputs,		// This ensure that we correctly format ObjC methods calls without inputs,
// i.e. where the last element isn't selector like: [callee method];		// i.e. where the last element isn't selector like: [callee method];
if (NextNonComment->is(tok::identifier) && NextNonComment->FakeRParens == 0 &&		if (NextNonComment->is(tok::identifier) && NextNonComment->FakeRParens == 0 &&
NextNonComment->Next && NextNonComment->Next->is(TT_ObjCMethodExpr)) {		NextNonComment->Next && NextNonComment->Next->is(TT_ObjCMethodExpr)) {
return CurrentState.Indent;		return CurrentState.Indent;
}		}

if (NextNonComment->isOneOf(TT_StartOfName, TT_PointerOrReference) \|\|		if (NextNonComment->isOneOf(TT_StartOfName, TT_PointerOrReference) \|\|
▲ Show 20 Lines • Show All 1,310 Lines • Show Last 20 Lines

clang/lib/Format/TokenAnnotator.h

Show All 34 Lines
};		};

class AnnotatedLine {		class AnnotatedLine {
public:		public:
AnnotatedLine(const UnwrappedLine &Line)		AnnotatedLine(const UnwrappedLine &Line)
: First(Line.Tokens.front().Tok), Level(Line.Level),		: First(Line.Tokens.front().Tok), Level(Line.Level),
MatchingOpeningBlockLineIndex(Line.MatchingOpeningBlockLineIndex),		MatchingOpeningBlockLineIndex(Line.MatchingOpeningBlockLineIndex),
MatchingClosingBlockLineIndex(Line.MatchingClosingBlockLineIndex),		MatchingClosingBlockLineIndex(Line.MatchingClosingBlockLineIndex),
InPPDirective(Line.InPPDirective), InMacroBody(Line.InMacroBody),		InPPDirective(Line.InPPDirective),
		InPragmaDirective(Line.InPragmaDirective),
		InMacroBody(Line.InMacroBody),
MustBeDeclaration(Line.MustBeDeclaration), MightBeFunctionDecl(false),		MustBeDeclaration(Line.MustBeDeclaration), MightBeFunctionDecl(false),
IsMultiVariableDeclStmt(false), Affected(false),		IsMultiVariableDeclStmt(false), Affected(false),
LeadingEmptyLinesAffected(false), ChildrenAffected(false),		LeadingEmptyLinesAffected(false), ChildrenAffected(false),
IsContinuation(Line.IsContinuation),		IsContinuation(Line.IsContinuation),
FirstStartColumn(Line.FirstStartColumn) {		FirstStartColumn(Line.FirstStartColumn) {
assert(!Line.Tokens.empty());		assert(!Line.Tokens.empty());

// Calculate Next and Previous for all tokens. Note that we must overwrite		// Calculate Next and Previous for all tokens. Note that we must overwrite
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	public:

SmallVector<AnnotatedLine *, 0> Children;		SmallVector<AnnotatedLine *, 0> Children;

LineType Type;		LineType Type;
unsigned Level;		unsigned Level;
size_t MatchingOpeningBlockLineIndex;		size_t MatchingOpeningBlockLineIndex;
size_t MatchingClosingBlockLineIndex;		size_t MatchingClosingBlockLineIndex;
bool InPPDirective;		bool InPPDirective;
		bool InPragmaDirective;
bool InMacroBody;		bool InMacroBody;
bool MustBeDeclaration;		bool MustBeDeclaration;
bool MightBeFunctionDecl;		bool MightBeFunctionDecl;
bool IsMultiVariableDeclStmt;		bool IsMultiVariableDeclStmt;

/// \c True if this line should be formatted, i.e. intersects directly or		/// \c True if this line should be formatted, i.e. intersects directly or
/// indirectly with one of the input ranges.		/// indirectly with one of the input ranges.
bool Affected;		bool Affected;
▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

clang/lib/Format/TokenAnnotator.cpp

Show First 20 Lines • Show All 760 Lines • ▼ Show 20 Lines	while (CurrentToken) {
return false;		return false;
if (CurrentToken->is(tok::colon)) {		if (CurrentToken->is(tok::colon)) {
if (IsCpp11AttributeSpecifier &&		if (IsCpp11AttributeSpecifier &&
CurrentToken->endsSequence(tok::colon, tok::identifier,		CurrentToken->endsSequence(tok::colon, tok::identifier,
tok::kw_using)) {		tok::kw_using)) {
// Remember that this is a [[using ns: foo]] C++ attribute, so we		// Remember that this is a [[using ns: foo]] C++ attribute, so we
// don't add a space before the colon (unlike other colons).		// don't add a space before the colon (unlike other colons).
CurrentToken->setType(TT_AttributeColon);		CurrentToken->setType(TT_AttributeColon);
} else if (!Style.isVerilog() &&		} else if (!Style.isVerilog() && !Line.InPragmaDirective &&
Left->isOneOf(TT_ArraySubscriptLSquare,		Left->isOneOf(TT_ArraySubscriptLSquare,
TT_DesignatedInitializerLSquare)) {		TT_DesignatedInitializerLSquare)) {
Left->setType(TT_ObjCMethodExpr);		Left->setType(TT_ObjCMethodExpr);
StartsObjCMethodExpr = true;		StartsObjCMethodExpr = true;
Contexts.back().ColonIsObjCMethodExpr = true;		Contexts.back().ColonIsObjCMethodExpr = true;
if (Parent && Parent->is(tok::r_paren)) {		if (Parent && Parent->is(tok::r_paren)) {
// FIXME(bug 36976): ObjC return types shouldn't use TT_CastRParen.		// FIXME(bug 36976): ObjC return types shouldn't use TT_CastRParen.
Parent->setType(TT_CastRParen);		Parent->setType(TT_CastRParen);
▲ Show 20 Lines • Show All 265 Lines • ▼ Show 20 Lines	case tok::colon:
}		}
} else if (canBeObjCSelectorComponent(*Tok->Previous) && Tok->Next &&		} else if (canBeObjCSelectorComponent(*Tok->Previous) && Tok->Next &&
(Tok->Next->isOneOf(tok::r_paren, tok::comma) \|\|		(Tok->Next->isOneOf(tok::r_paren, tok::comma) \|\|
(canBeObjCSelectorComponent(*Tok->Next) && Tok->Next->Next &&		(canBeObjCSelectorComponent(*Tok->Next) && Tok->Next->Next &&
Tok->Next->Next->is(tok::colon)))) {		Tok->Next->Next->is(tok::colon)))) {
// This handles a special macro in ObjC code where selectors including		// This handles a special macro in ObjC code where selectors including
// the colon are passed as macro arguments.		// the colon are passed as macro arguments.
Tok->setType(TT_ObjCMethodExpr);		Tok->setType(TT_ObjCMethodExpr);
} else if (Contexts.back().ContextKind == tok::l_paren) {		} else if (Contexts.back().ContextKind == tok::l_paren &&
		!Line.InPragmaDirective) {
Tok->setType(TT_InlineASMColon);		Tok->setType(TT_InlineASMColon);
}		}
break;		break;
case tok::pipe:		case tok::pipe:
case tok::amp:		case tok::amp:
// \| and & in declarations/type expressions represent union and		// \| and & in declarations/type expressions represent union and
// intersection types, respectively.		// intersection types, respectively.
if (Style.isJavaScript() && !Contexts.back().IsExpression)		if (Style.isJavaScript() && !Contexts.back().IsExpression)
▲ Show 20 Lines • Show All 4,064 Lines • Show Last 20 Lines

clang/lib/Format/UnwrappedLineParser.h

Show All 40 Lines	struct UnwrappedLine {
/// The \c Tokens comprising this \c UnwrappedLine.		/// The \c Tokens comprising this \c UnwrappedLine.
std::list<UnwrappedLineNode> Tokens;		std::list<UnwrappedLineNode> Tokens;

/// The indent level of the \c UnwrappedLine.		/// The indent level of the \c UnwrappedLine.
unsigned Level;		unsigned Level;

/// Whether this \c UnwrappedLine is part of a preprocessor directive.		/// Whether this \c UnwrappedLine is part of a preprocessor directive.
bool InPPDirective;		bool InPPDirective;
		/// Whether this \c UnwrappedLine is part of a pramga directive.
		bool InPragmaDirective;
/// Whether it is part of a macro body.		/// Whether it is part of a macro body.
bool InMacroBody;		bool InMacroBody;

bool MustBeDeclaration;		bool MustBeDeclaration;

/// \c True if this line should be indented by ContinuationIndent in		/// \c True if this line should be indented by ContinuationIndent in
/// addition to the normal indention level.		/// addition to the normal indention level.
bool IsContinuation = false;		bool IsContinuation = false;
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	private:
void parseChildBlock(bool CanContainBracedList = true,		void parseChildBlock(bool CanContainBracedList = true,
TokenType NextLBracesType = TT_Unknown);		TokenType NextLBracesType = TT_Unknown);
void parsePPDirective();		void parsePPDirective();
void parsePPDefine();		void parsePPDefine();
void parsePPIf(bool IfDef);		void parsePPIf(bool IfDef);
void parsePPElIf();		void parsePPElIf();
void parsePPElse();		void parsePPElse();
void parsePPEndIf();		void parsePPEndIf();
		void parsePPPragma();
void parsePPUnknown();		void parsePPUnknown();
void readTokenWithJavaScriptASI();		void readTokenWithJavaScriptASI();
void parseStructuralElement(bool IsTopLevel = false,		void parseStructuralElement(bool IsTopLevel = false,
TokenType NextLBracesType = TT_Unknown,		TokenType NextLBracesType = TT_Unknown,
IfStmtKind *IfKind = nullptr,		IfStmtKind *IfKind = nullptr,
FormatToken **IfLeftBrace = nullptr,		FormatToken **IfLeftBrace = nullptr,
bool *HasDoWhile = nullptr,		bool *HasDoWhile = nullptr,
bool *HasLabel = nullptr);		bool *HasLabel = nullptr);
▲ Show 20 Lines • Show All 219 Lines • ▼ Show 20 Lines	struct UnwrappedLineNode {
UnwrappedLineNode() : Tok(nullptr) {}		UnwrappedLineNode() : Tok(nullptr) {}
UnwrappedLineNode(FormatToken *Tok) : Tok(Tok) {}		UnwrappedLineNode(FormatToken *Tok) : Tok(Tok) {}

FormatToken *Tok;		FormatToken *Tok;
SmallVector<UnwrappedLine, 0> Children;		SmallVector<UnwrappedLine, 0> Children;
};		};

inline UnwrappedLine::UnwrappedLine()		inline UnwrappedLine::UnwrappedLine()
: Level(0), InPPDirective(false), InMacroBody(false),		: Level(0), InPPDirective(false), InPragmaDirective(false),
MustBeDeclaration(false), MatchingOpeningBlockLineIndex(kInvalidIndex) {}		InMacroBody(false), MustBeDeclaration(false),
		MatchingOpeningBlockLineIndex(kInvalidIndex) {}

} // end namespace format		} // end namespace format
} // end namespace clang		} // end namespace clang

#endif		#endif

clang/lib/Format/UnwrappedLineParser.cpp

Show First 20 Lines • Show All 1,112 Lines • ▼ Show 20 Lines	void UnwrappedLineParser::parsePPDirective() {
case tok::pp_elifdef:		case tok::pp_elifdef:
case tok::pp_elifndef:		case tok::pp_elifndef:
case tok::pp_elif:		case tok::pp_elif:
parsePPElIf();		parsePPElIf();
break;		break;
case tok::pp_endif:		case tok::pp_endif:
parsePPEndIf();		parsePPEndIf();
break;		break;
		case tok::pp_pragma:
		parsePPPragma();
		break;
default:		default:
parsePPUnknown();		parsePPUnknown();
break;		break;
}		}
}		}

void UnwrappedLineParser::conditionalCompilationCondition(bool Unreachable) {		void UnwrappedLineParser::conditionalCompilationCondition(bool Unreachable) {
size_t Line = CurrentLines->size();		size_t Line = CurrentLines->size();
▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	void UnwrappedLineParser::parsePPDefine() {
// Errors during a preprocessor directive can only affect the layout of the		// Errors during a preprocessor directive can only affect the layout of the
// preprocessor directive, and thus we ignore them. An alternative approach		// preprocessor directive, and thus we ignore them. An alternative approach
// would be to use the same approach we use on the file level (no		// would be to use the same approach we use on the file level (no
// re-indentation if there was a structural error) within the macro		// re-indentation if there was a structural error) within the macro
// definition.		// definition.
parseFile();		parseFile();
}		}

		void UnwrappedLineParser::parsePPPragma() {
		Line->InPragmaDirective = true;
		parsePPUnknown();
		}

void UnwrappedLineParser::parsePPUnknown() {		void UnwrappedLineParser::parsePPUnknown() {
do {		do {
nextToken();		nextToken();
} while (!eof());		} while (!eof());
if (Style.IndentPPDirectives != FormatStyle::PPDIS_None)		if (Style.IndentPPDirectives != FormatStyle::PPDIS_None)
Line->Level += PPBranchLevel + 1;		Line->Level += PPBranchLevel + 1;
addUnwrappedLine();		addUnwrappedLine();
}		}
▲ Show 20 Lines • Show All 3,372 Lines • Show Last 20 Lines

clang/unittests/Format/FormatTest.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,166 Lines • ▼ Show 20 Lines	verifyFormat("#define MACRO(a) \\\n"
" f(); \\\n"		" f(); \\\n"
" else \\\n"		" else \\\n"
" g()",		" g()",
getLLVMStyleWithColumns(18));		getLLVMStyleWithColumns(18));
verifyFormat("#define A template <typename T>");		verifyFormat("#define A template <typename T>");
verifyIncompleteFormat("#define STR(x) #x\n"		verifyIncompleteFormat("#define STR(x) #x\n"
"f(STR(this_is_a_string_literal{));");		"f(STR(this_is_a_string_literal{));");
verifyFormat("#pragma omp threadprivate( \\\n"		verifyFormat("#pragma omp threadprivate( \\\n"
" y)), // expected-warning",		" y)), // expected-warning",
		owenpanUnsubmitted Not Done Reply Inline Actions Why was this test case changed? It seemed to be related to the regression mentioned in D136100#3988574. owenpan: Why was this test case changed? It seemed to be related to the regression mentioned in…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions It's definitely related, we want some indentation for successive OpenMP clauses pushed to a new line, see below. jhuber6: It's definitely related, we want some indentation for successive OpenMP clauses pushed to a new…
getLLVMStyleWithColumns(28));		getLLVMStyleWithColumns(28));
verifyFormat("#d, = };");		verifyFormat("#d, = };");
verifyFormat("#if \"a");		verifyFormat("#if \"a");
verifyIncompleteFormat("({\n"		verifyIncompleteFormat("({\n"
"#define b \\\n"		"#define b \\\n"
" } \\\n"		" } \\\n"
" a\n"		" a\n"
"a",		"a",
▲ Show 20 Lines • Show All 14,746 Lines • ▼ Show 20 Lines	TEST_F(FormatTest, UnderstandsPragmas) {
verifyFormat("#pragma omp reduction(+ : var)");		verifyFormat("#pragma omp reduction(+ : var)");

EXPECT_EQ("#pragma mark Any non-hyphenated or hyphenated string "		EXPECT_EQ("#pragma mark Any non-hyphenated or hyphenated string "
"(including parentheses).",		"(including parentheses).",
format("#pragma mark Any non-hyphenated or hyphenated string "		format("#pragma mark Any non-hyphenated or hyphenated string "
"(including parentheses)."));		"(including parentheses)."));
}		}

		TEST_F(FormatTest, UnderstandsPragmaOmpTarget) {
		verifyFormat("#pragma omp target map(to : var)");
		verifyFormat("#pragma omp target map(to : var[ : N])");
		verifyFormat("#pragma omp target map(to : var[0 : N])");
		verifyFormat("#pragma omp target map(always, to : var[0 : N])");

		EXPECT_EQ(
		"#pragma omp target \\\n"
		" reduction(+ : var) \\\n"
		" map(to : A[0 : N]) \\\n"
		" map(to : B[0 : N]) \\\n"
		" map(from : C[0 : N]) \\\n"
		" firstprivate(i) \\\n"
		" firstprivate(j) \\\n"
		" firstprivate(k)",
		format(
		"#pragma omp target reduction(+:var) map(to:A[0:N]) map(to:B[0:N]) "
		"map(from:C[0:N]) firstprivate(i) firstprivate(j) firstprivate(k)",
		getLLVMStyleWithColumns(26)));
		}

TEST_F(FormatTest, UnderstandPragmaOption) {		TEST_F(FormatTest, UnderstandPragmaOption) {
verifyFormat("#pragma option -C -A");		verifyFormat("#pragma option -C -A");

EXPECT_EQ("#pragma option -C -A", format("#pragma option -C -A"));		EXPECT_EQ("#pragma option -C -A", format("#pragma option -C -A"));
}		}

TEST_F(FormatTest, UnderstandPragmaRegion) {		TEST_F(FormatTest, UnderstandPragmaRegion) {
auto Style = getLLVMStyleWithColumns(0);		auto Style = getLLVMStyleWithColumns(0);
▲ Show 20 Lines • Show All 6,851 Lines • Show Last 20 Lines