This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Fix WhitespaceSensitiveMacros not being honoured when macro closing parenthesis is followed by a newline.
ClosedPublic

Authored by curdeius on Apr 13 2022, 5:54 AM.

Details

Summary

Fixes https://github.com/llvm/llvm-project/issues/54522.

This fixes regression introduced in https://github.com/llvm/llvm-project/commit/5e5efd8a91f2e340e79a73bedbc6ab66ad4a4281.

Before the culprit commit, macros in WhitespaceSensitiveMacros were correctly formatted even if their closing parenthesis weren't followed by semicolon (or, to be precise, when they were followed by a newline).
That commit changed the type of the macro token type from TT_UntouchableMacroFunc to TT_FunctionLikeOrFreestandingMacro.

Correct formatting (with WhitespaceSensitiveMacros = ['FOO']):

FOO(1+2)
FOO(1+2);

Regressed formatting:

FOO(1 + 2)
FOO(1+2);

Diff Detail

Event Timeline

curdeius created this revision.Apr 13 2022, 5:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 13 2022, 5:54 AM
curdeius requested review of this revision.Apr 13 2022, 5:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 13 2022, 5:54 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ksyx added inline comments.Apr 13 2022, 6:42 AM
clang/lib/Format/FormatTokenLexer.h
117–120

Would making constructor of struct MacroTokenInfo having default parameter or overloading it help avoiding the change of adding , /*Finalized=*/false to the existing initializer lists?

curdeius added inline comments.Apr 13 2022, 7:42 AM
clang/lib/Format/FormatTokenLexer.h
117–120

I've thought about it, but it would mean that we have a non-explicit 1-arg ctor. I'm not a big fan of these as they trigger implicit conversions.
I can do though:

struct MacroTokenInfo {
  TokenType Type;
  bool Finalized{false};
};

but we'd still need adding braces in:

Macros.insert({Identifier, {TT_ForEachMacro}});
ksyx added inline comments.Apr 13 2022, 7:54 AM
clang/lib/Format/FormatTokenLexer.h
117–120

Yes they are both good point to consider and my start point is just that the finalized property is less frequently be true.

owenpan added inline comments.Apr 13 2022, 6:36 PM
clang/lib/Format/UnwrappedLineParser.cpp
1791

Can we simply do this and leave FormatTokenLexer alone?

clang/unittests/Format/FormatTest.cpp
23545

Do we really need this test case?

curdeius added inline comments.Apr 13 2022, 10:32 PM
clang/lib/Format/UnwrappedLineParser.cpp
1791

We can too. It seemed hacky to me because we can miss TT_UntouchableMacroFunc in other places.
Setting the token type finalized in the lexer will avoid such problems in the future.
I'm okay however to just apply your suggestion.

clang/unittests/Format/FormatTest.cpp
23545

Not really. I just wrote it to cover both cases but it's covered by existing cases indeed. Will remove.

owenpan added inline comments.Apr 14 2022, 2:08 AM
clang/lib/Format/FormatTokenLexer.cpp
1031–1035

It seems we can simply do this and leave the rest of FormatTokenLexer alone.

owenpan added inline comments.Apr 14 2022, 2:11 AM
clang/lib/Format/UnwrappedLineParser.cpp
1791

We can too. It seemed hacky to me because we can miss TT_UntouchableMacroFunc in other places.
Setting the token type finalized in the lexer will avoid such problems in the future.

Yeah. Please see my comment above though.

clang/lib/Format/FormatTokenLexer.cpp
1031–1035

+1

clang/lib/Format/FormatTokenLexer.h
117–120

I wouldn't add a CTor. And I also wouldn't add a default initializer. But the latter is better.

curdeius updated this revision to Diff 427566.May 6 2022, 1:56 AM

Simplify. Address comments.

curdeius marked 5 inline comments as done.May 6 2022, 1:58 AM
curdeius added inline comments.
clang/unittests/Format/FormatTest.cpp
23545

On a second thought, we don't have any other test with a semicolon and a newline, so I'd rather keep this test.

owenpan accepted this revision.May 6 2022, 8:34 AM

LGTM

This revision is now accepted and ready to land.May 6 2022, 8:34 AM
ksyx accepted this revision.May 6 2022, 2:30 PM

Thanks!

It looks like this regressed the following example by adding an unwanted level of indentation to the #elif B branch:

% ./clang-format --version
clang-format version 15.0.0 (https://github.com/llvm/llvm-project.git 50cd52d9357224cce66a9e00c9a0417c658a5655)
% cat test.cc             
#define MACRO_BEGIN

MACRO_BEGIN

namespace internal {

#if A
int f() { return 0; }
#elif B
int f() { return 1; }
#endif

}  // namespace internal
% ./clang-format test.cc
#define MACRO_BEGIN

MACRO_BEGIN

namespace internal {

#if A
int f() { return 0; }
#elif B
  int f() { return 1; }
#endif

} // namespace internal
%

@curdeius could you please take a look?

It looks like this regressed the following example by adding an unwanted level of indentation to the #elif B branch:

Sure, I'll have a look.
It seems that even this:

MACRO_BEGIN
#if A
int f();
#else
int f();
#endif

gets misindented:

MACRO_BEGIN
#if A
int f();
#else
    int
    f();
#endif

We found another regression with this in wrongly indenting/not putting on its own line ObjC @interface:

% ./clang-format --version           
clang-format version 15.0.0 (https://github.com/llvm/llvm-project.git 50cd52d9357224cce66a9e00c9a0417c658a5655)
% cat test.m
NS_SWIFT_NAME(A)
@interface B : C
@property(readonly) D value;
@end

% ./clang-format test.m
NS_SWIFT_NAME(A) @interface B : C
@property(readonly) D value;
@end
%
curdeius reopened this revision.May 17 2022, 10:28 PM

Reverted for now.

This revision is now accepted and ready to land.May 17 2022, 10:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2023, 5:35 AM
Herald added a reviewer: rymiel. · View Herald Transcript
Herald added a subscriber: wangpc. · View Herald Transcript