This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Format/
-
Format/
2/2
FormatTokenLexer.cpp
3/3
UnwrappedLineParser.cpp
-
unittests/Format/
-
Format/
3/3
FormatTest.cpp

Differential D123676

[clang-format] Fix WhitespaceSensitiveMacros not being honoured when macro closing parenthesis is followed by a newline.
ClosedPublic

Authored by curdeius on Apr 13 2022, 5:54 AM.

Download Raw Diff

Details

Reviewers

MyDeveloperDay
HazardyKnusperkeks
owenpan
ksyx
rymiel

Commits

rG50cd52d93572: [clang-format] Fix WhitespaceSensitiveMacros not being honoured when macro…

Summary

Fixes https://github.com/llvm/llvm-project/issues/54522.

This fixes regression introduced in https://github.com/llvm/llvm-project/commit/5e5efd8a91f2e340e79a73bedbc6ab66ad4a4281.

Before the culprit commit, macros in WhitespaceSensitiveMacros were correctly formatted even if their closing parenthesis weren't followed by semicolon (or, to be precise, when they were followed by a newline).
That commit changed the type of the macro token type from TT_UntouchableMacroFunc to TT_FunctionLikeOrFreestandingMacro.

Correct formatting (with WhitespaceSensitiveMacros = ['FOO']):

FOO(1+2)
FOO(1+2);

Regressed formatting:

FOO(1 + 2)
FOO(1+2);

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

curdeius created this revision.Apr 13 2022, 5:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 13 2022, 5:54 AM

curdeius requested review of this revision.Apr 13 2022, 5:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 13 2022, 5:54 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B159434: Diff 422494.Apr 13 2022, 6:38 AM

ksyx added inline comments.Apr 13 2022, 6:42 AM

clang/lib/Format/FormatTokenLexer.h
117–120 ↗	(On Diff #422494)	Would making constructor of `struct MacroTokenInfo` having default parameter or overloading it help avoiding the change of adding `, /Finalized=/false` to the existing initializer lists?

curdeius added inline comments.Apr 13 2022, 7:42 AM

clang/lib/Format/FormatTokenLexer.h
117–120 ↗	(On Diff #422494)	I've thought about it, but it would mean that we have a non-explicit 1-arg ctor. I'm not a big fan of these as they trigger implicit conversions. I can do though: struct MacroTokenInfo { TokenType Type; bool Finalized{false}; }; but we'd still need adding braces in: Macros.insert({Identifier, {TT_ForEachMacro}});

ksyx added inline comments.Apr 13 2022, 7:54 AM

clang/lib/Format/FormatTokenLexer.h
117–120 ↗	(On Diff #422494)	Yes they are both good point to consider and my start point is just that the `finalized` property is less frequently be `true`.

owenpan added inline comments.Apr 13 2022, 6:36 PM

clang/lib/Format/UnwrappedLineParser.cpp
1791	Can we simply do this and leave `FormatTokenLexer` alone?
clang/unittests/Format/FormatTest.cpp
23611	Do we really need this test case?

curdeius added inline comments.Apr 13 2022, 10:32 PM

clang/lib/Format/UnwrappedLineParser.cpp
1791	We can too. It seemed hacky to me because we can miss `TT_UntouchableMacroFunc` in other places. Setting the token type finalized in the lexer will avoid such problems in the future. I'm okay however to just apply your suggestion.
clang/unittests/Format/FormatTest.cpp
23611	Not really. I just wrote it to cover both cases but it's covered by existing cases indeed. Will remove.

owenpan added inline comments.Apr 14 2022, 2:08 AM

clang/lib/Format/FormatTokenLexer.cpp
1034–1035	It seems we can simply do this and leave the rest of `FormatTokenLexer` alone.

owenpan added inline comments.Apr 14 2022, 2:11 AM

clang/lib/Format/UnwrappedLineParser.cpp
1791	We can too. It seemed hacky to me because we can miss `TT_UntouchableMacroFunc` in other places. Setting the token type finalized in the lexer will avoid such problems in the future. Yeah. Please see my comment above though.

HazardyKnusperkeks added inline comments.Apr 21 2022, 2:06 AM

clang/lib/Format/FormatTokenLexer.cpp
1034–1035	+1
clang/lib/Format/FormatTokenLexer.h
117–120 ↗	(On Diff #422494)	I wouldn't add a CTor. And I also wouldn't add a default initializer. But the latter is better.

simon.giesecke added a subscriber: simon.giesecke.Apr 21 2022, 5:43 AM

Simplify. Address comments.

curdeius marked 5 inline comments as done.May 6 2022, 1:58 AM

curdeius added inline comments.

clang/unittests/Format/FormatTest.cpp
23611	On a second thought, we don't have any other test with a semicolon and a newline, so I'd rather keep this test.

Harbormaster completed remote builds in B163094: Diff 427566.May 6 2022, 3:36 AM

LGTM

This revision is now accepted and ready to land.May 6 2022, 8:34 AM

HazardyKnusperkeks accepted this revision.May 6 2022, 1:48 PM

Thanks!

Closed by commit rG50cd52d93572: [clang-format] Fix WhitespaceSensitiveMacros not being honoured when macro… (authored by curdeius). · Explain WhyMay 9 2022, 1:59 AM

This revision was automatically updated to reflect the committed changes.

curdeius added a commit: rG50cd52d93572: [clang-format] Fix WhitespaceSensitiveMacros not being honoured when macro….

It looks like this regressed the following example by adding an unwanted level of indentation to the #elif B branch:

% ./clang-format --version
clang-format version 15.0.0 (https://github.com/llvm/llvm-project.git 50cd52d9357224cce66a9e00c9a0417c658a5655)
% cat test.cc             
#define MACRO_BEGIN

MACRO_BEGIN

namespace internal {

#if A
int f() { return 0; }
#elif B
int f() { return 1; }
#endif

}  // namespace internal
% ./clang-format test.cc
#define MACRO_BEGIN

MACRO_BEGIN

namespace internal {

#if A
int f() { return 0; }
#elif B
  int f() { return 1; }
#endif

} // namespace internal
%

@curdeius could you please take a look?

In D123676#3515949, @krasimir wrote:

It looks like this regressed the following example by adding an unwanted level of indentation to the #elif B branch:

Sure, I'll have a look.
It seems that even this:

MACRO_BEGIN
#if A
int f();
#else
int f();
#endif

gets misindented:

MACRO_BEGIN
#if A
int f();
#else
    int
    f();
#endif

We found another regression with this in wrongly indenting/not putting on its own line ObjC @interface:

% ./clang-format --version           
clang-format version 15.0.0 (https://github.com/llvm/llvm-project.git 50cd52d9357224cce66a9e00c9a0417c658a5655)
% cat test.m
NS_SWIFT_NAME(A)
@interface B : C
@property(readonly) D value;
@end

% ./clang-format test.m
NS_SWIFT_NAME(A) @interface B : C
@property(readonly) D value;
@end
%

curdeius added a reverting change: rG573a5b58001d: Revert "[clang-format] Fix WhitespaceSensitiveMacros not being honoured when….May 17 2022, 10:27 PM

Reverted for now.

This revision is now accepted and ready to land.May 17 2022, 10:28 PM

Fixed in D132001.

Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2023, 5:35 AM

Herald added a reviewer: rymiel. · View Herald Transcript

Herald added a subscriber: wangpc. · View Herald Transcript

Revision Contents

Path

Size

clang/

lib/

Format/

FormatTokenLexer.cpp

5 lines

UnwrappedLineParser.cpp

3 lines

unittests/

Format/

FormatTest.cpp

5 lines

Diff 428005

clang/lib/Format/FormatTokenLexer.cpp

Show First 20 Lines • Show All 1,021 Lines • ▼ Show 20 Lines FormatToken *FormatTokenLexer::getNextToken() {

} }

if (Style.isCpp()) { if (Style.isCpp()) {

auto it = Macros.find(FormatTok->Tok.getIdentifierInfo()); auto it = Macros.find(FormatTok->Tok.getIdentifierInfo());

if (!(Tokens.size() > 0 && Tokens.back()->Tok.getIdentifierInfo() && if (!(Tokens.size() > 0 && Tokens.back()->Tok.getIdentifierInfo() &&

Tokens.back()->Tok.getIdentifierInfo()->getPPKeywordID() == Tokens.back()->Tok.getIdentifierInfo()->getPPKeywordID() ==

tok::pp_define) && tok::pp_define) &&

it != Macros.end()) { it != Macros.end()) {

if (it->second == TT_UntouchableMacroFunc)

FormatTok->setFinalizedType(TT_UntouchableMacroFunc);

else

FormatTok->setType(it->second); FormatTok->setType(it->second);

if (it->second == TT_IfMacro) { if (it->second == TT_IfMacro) {

// The lexer token currently has type tok::kw_unknown. However, for this // The lexer token currently has type tok::kw_unknown. However, for this

owenpanUnsubmitted

Done

it != Macros.end()) {

- if (it->second.Finalized) {

- FormatTok->setFinalizedType(it->second.Type);

- } else {

- FormatTok->setType(it->second.Type);

- }

+ if (it->second == TT_UntouchableMacroFunc)

+ FormatTok->setFinalizedType(TT_UntouchableMacroFunc);

+ else

+ FormatTok->setType(it->second);

if (it->second.Type == TT_IfMacro) {

It seems we can simply do this and leave the rest of FormatTokenLexer alone.

owenpan: It seems we can simply do this and leave the rest of `FormatTokenLexer` alone.

HazardyKnusperkeksUnsubmitted

Done

HazardyKnusperkeks: +1

// substitution to be treated correctly in the TokenAnnotator, faking // substitution to be treated correctly in the TokenAnnotator, faking

// the tok value seems to be needed. Not sure if there's a more elegant // the tok value seems to be needed. Not sure if there's a more elegant

// way. // way.

FormatTok->Tok.setKind(tok::kw_if); FormatTok->Tok.setKind(tok::kw_if);

} }

} else if (FormatTok->is(tok::identifier)) { } else if (FormatTok->is(tok::identifier)) {

if (MacroBlockBeginRegex.match(Text)) if (MacroBlockBeginRegex.match(Text))

FormatTok->setType(TT_MacroBlockBegin); FormatTok->setType(TT_MacroBlockBegin);

▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

clang/lib/Format/UnwrappedLineParser.cpp

Show First 20 Lines • Show All 1,781 Lines • ▼ Show 20 Lines case tok::identifier: {

parseParens(); parseParens();

bool FollowedByNewline = bool FollowedByNewline =

CommentsBeforeNextToken.empty() CommentsBeforeNextToken.empty()

? FormatTok->NewlinesBefore > 0 ? FormatTok->NewlinesBefore > 0

: CommentsBeforeNextToken.front()->NewlinesBefore > 0; : CommentsBeforeNextToken.front()->NewlinesBefore > 0;

if (FollowedByNewline && (Text.size() >= 5 || FunctionLike) && if (FollowedByNewline && (Text.size() >= 5 || FunctionLike) &&

tokenCanStartNewLine(*FormatTok) && Text == Text.upper()) { tokenCanStartNewLine(*FormatTok) && Text == Text.upper() &&

!PreviousToken->isTypeFinalized()) {

owenpanUnsubmitted

Done

tokenCanStartNewLine(*FormatTok) && Text == Text.upper() &&

- !PreviousToken->isTypeFinalized()) {

+ PreviousToken->isNot(TT_UntouchableMacroFunc)) {

PreviousToken->setFinalizedType(TT_FunctionLikeOrFreestandingMacro);

Can we simply do this and leave FormatTokenLexer alone?

owenpan: Can we simply do this and leave `FormatTokenLexer` alone?

curdeiusAuthorUnsubmitted

Done

We can too. It seemed hacky to me because we can miss TT_UntouchableMacroFunc in other places.
Setting the token type finalized in the lexer will avoid such problems in the future.
I'm okay however to just apply your suggestion.

curdeius: We can too. It seemed hacky to me because we can miss `TT_UntouchableMacroFunc` in other places.

owenpanUnsubmitted

Done

We can too. It seemed hacky to me because we can miss TT_UntouchableMacroFunc in other places.
Setting the token type finalized in the lexer will avoid such problems in the future.

Yeah. Please see my comment above though.

owenpan: > We can too. It seemed hacky to me because we can miss `TT_UntouchableMacroFunc` in other…

PreviousToken->setFinalizedType(TT_FunctionLikeOrFreestandingMacro); PreviousToken->setFinalizedType(TT_FunctionLikeOrFreestandingMacro);

addUnwrappedLine(); addUnwrappedLine();

return; return;

} }

break; break;

} }

case tok::equal: case tok::equal:

▲ Show 20 Lines • Show All 2,327 Lines • Show Last 20 Lines

clang/unittests/Format/FormatTest.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 23,600 Lines • ▼ Show 20 Lines
	}			}

	TEST_F(FormatTest, WhitespaceSensitiveMacros) {			TEST_F(FormatTest, WhitespaceSensitiveMacros) {
	FormatStyle Style = getLLVMStyle();			FormatStyle Style = getLLVMStyle();
	Style.WhitespaceSensitiveMacros.push_back("FOO");			Style.WhitespaceSensitiveMacros.push_back("FOO");

	// Don't use the helpers here, since 'mess up' will change the whitespace			// Don't use the helpers here, since 'mess up' will change the whitespace
	// and these are all whitespace sensitive by definition			// and these are all whitespace sensitive by definition

				// Newlines are important here.
				EXPECT_EQ("FOO(1+2 );\n", format("FOO(1+2 );\n", Style));
				owenpanUnsubmitted Done Reply Inline Actions Do we really need this test case? owenpan: Do we really need this test case?
				curdeiusAuthorUnsubmitted Done Reply Inline Actions Not really. I just wrote it to cover both cases but it's covered by existing cases indeed. Will remove. curdeius: Not really. I just wrote it to cover both cases but it's covered by existing cases indeed. Will…
				curdeiusAuthorUnsubmitted Done Reply Inline Actions On a second thought, we don't have any other test with a semicolon and a newline, so I'd rather keep this test. curdeius: On a second thought, we don't have any other test with a semicolon and a newline, so I'd rather…
				EXPECT_EQ("FOO(1+2 )\n", format("FOO(1+2 )\n", Style));

	EXPECT_EQ("FOO(String-ized&Messy+But(: :Still)=Intentional);",			EXPECT_EQ("FOO(String-ized&Messy+But(: :Still)=Intentional);",
	format("FOO(String-ized&Messy+But(: :Still)=Intentional);", Style));			format("FOO(String-ized&Messy+But(: :Still)=Intentional);", Style));
	EXPECT_EQ(			EXPECT_EQ(
	"FOO(String-ized&Messy+But\\(: :Still)=Intentional);",			"FOO(String-ized&Messy+But\\(: :Still)=Intentional);",
	format("FOO(String-ized&Messy+But\\(: :Still)=Intentional);", Style));			format("FOO(String-ized&Messy+But\\(: :Still)=Intentional);", Style));
	EXPECT_EQ("FOO(String-ized&Messy+But,: :Still=Intentional);",			EXPECT_EQ("FOO(String-ized&Messy+But,: :Still=Intentional);",
	format("FOO(String-ized&Messy+But,: :Still=Intentional);", Style));			format("FOO(String-ized&Messy+But,: :Still=Intentional);", Style));
	EXPECT_EQ("FOO(String-ized&Messy+But,: :\n"			EXPECT_EQ("FOO(String-ized&Messy+But,: :\n"
	▲ Show 20 Lines • Show All 2,154 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Fix WhitespaceSensitiveMacros not being honoured when macro closing parenthesis is followed by a newline.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 428005

clang/lib/Format/FormatTokenLexer.cpp

clang/lib/Format/UnwrappedLineParser.cpp

clang/unittests/Format/FormatTest.cpp

[clang-format] Fix WhitespaceSensitiveMacros not being honoured when macro closing parenthesis is followed by a newline.
ClosedPublic