This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Fix SeparateDefinitionBlocks issues
ClosedPublic

Authored by ksyx on Jan 5 2022, 7:29 AM.

Details

Summary

Fixes https://github.com/llvm/llvm-project/issues/52976.

  • Make no formatting for macros
  • Attach comment with definition headers
  • Make no change on use of empty lines at block start/end
  • Fix misrecognition of keyword namespace

Diff Detail

Event Timeline

ksyx requested review of this revision.Jan 5 2022, 7:29 AM
ksyx created this revision.
ksyx planned changes to this revision.Jan 5 2022, 7:39 AM

failed on dealing with classes.

ksyx updated this revision to Diff 397596.Jan 5 2022, 8:37 AM

Fix failing when the first line of body is comment.

ksyx updated this revision to Diff 397602.Jan 5 2022, 8:47 AM

apply clangfmt suggestions.

MyDeveloperDay added inline comments.Jan 5 2022, 9:32 AM
clang/lib/Format/DefinitionBlockSeparator.cpp
41–42

Style.isJavaScript()

42–43

do we see this as JsExtraKeywords.kw_fcuntion?

106

I might be having a Brain f**t, but Isn't this just OpterateIndex += Direction in which case do we need the if/else?

Sorry but I find the double negative hard to read.

111

Style.isCSharp() please

112

Do we see this as an TT_AttributeSquare, this could help reduce possible false positives?

115–119

I never know if we should be using {} here?

140–141

remove {}

148

any reason why we are not looking for TargetToken->isNot(tok::l_brace)?

158

remove {}?

}

#if 0
<--- extra unwanted newline
void
func()
{
}
<--- extra unwanted newline
#endif

void bar()
{
}
MyDeveloperDay added a comment.EditedJan 5 2022, 9:40 AM
namespace {
<--- extraneous newline will be removed (just checking if thats what we want?)  shouldn't it honour MaxEmptyLines?  
   /*  foo */
   void foo()
   {
   }
<--- extraneous newline will be removed (just checking if thats what we want?)  shouldn't it honour MaxEmptyLines?
}
MyDeveloperDay added a comment.EditedJan 5 2022, 9:40 AM
}

#if 0
<--- extra unwanted newline
void
func()
{
}
<--- extra unwanted newline
#else
<--- extra unwanted newline
void
bar()
{
}
<--- extra unwanted newline
#endif

void bar()
{
}

I highly recommend you running this on a large code base...I'm pulling this patch down and running it on my work code base, (because I want to use this feature).

Don't take my highlighting issues as anything other than "Go faster I like it... ;-)"

But this is an definitely and improvement, and this is a great feature (one I've been doing by hand for years, so I'm super excited.)

clang/lib/Format/DefinitionBlockSeparator.cpp
115–119

We shoudn't.

And I prefer prefix in-/decrement, instead of postfix.

ksyx updated this revision to Diff 397715.Jan 5 2022, 2:29 PM
ksyx marked 8 inline comments as done.
ksyx edited the summary of this revision. (Show Details)

Apply most suggestions

ksyx added a comment.Jan 5 2022, 2:31 PM
namespace {
<--- extraneous newline will be removed (just checking if thats what we want?)  shouldn't it honour MaxEmptyLines?  
   /*  foo */
   void foo()
   {
   }
<--- extraneous newline will be removed (just checking if thats what we want?)  shouldn't it honour MaxEmptyLines?
}

I did not come out with an implementation of this currently, but one alternative solution might be to make it leave the line spacing as original code?

clang/lib/Format/DefinitionBlockSeparator.cpp
42–43

Yes, but I have no idea about what to do with this.

curdeius edited the summary of this revision. (Show Details)Jan 6 2022, 1:04 AM
MyDeveloperDay added inline comments.Jan 6 2022, 1:11 AM
clang/lib/Format/DefinitionBlockSeparator.cpp
79

you need to add unit tests for this

if I have:

void foo()
{
}
<----- you are removing the empty line here
#if 1
void bar()
{
}
#endif

but I think we still want that line between them.

Personally I would NOT attempt to remove any existing lines if I have my settings as Always there should be not empty line removal in my view.

Only the "Never" case should ever be removing lines

The same the other side

#if 1
void bar()
{
}
#endif
<----- you are removing the empty line here
void baz()
{
}

and here

#endif
<----- you are removing the empty line here
namespace 
{
}
MyDeveloperDay requested changes to this revision.Jan 6 2022, 1:12 AM
This revision now requires changes to proceed.Jan 6 2022, 1:12 AM
ksyx updated this revision to Diff 397979.Jan 6 2022, 1:35 PM
ksyx marked an inline comment as done.
ksyx edited the summary of this revision. (Show Details)

Apply suggested behaviour and fix misrecognition of token namespace

ksyx edited the summary of this revision. (Show Details)Jan 6 2022, 1:36 PM
ksyx updated this revision to Diff 397986.Jan 6 2022, 2:12 PM

apply suggestion from clangfmt.

ksyx updated this revision to Diff 397989.Jan 6 2022, 2:15 PM

apply suggestion from clangfmt.

Do we have any ifdef tests?

clang/unittests/Format/DefinitionBlockSeparatorTest.cpp
171

ok, I'm not a fan of having lambda/functions in unit tests that get reused, this is just my experience that when this fails on line 161 you have no idea which test is failing..

Sorry from my perspective there is something wrong here..verifyFormat should be able to give you what you need. personally I'd rather have it verbatium

MyDeveloperDay accepted this revision.Jan 7 2022, 1:35 AM

Its definitely a lot better (LGTM), as the original changes have been committed I might suggest we land these changes to get us much closer. (and reduce the churn and comments on this review)

Whilst I think you have fixed

https://github.com/llvm/llvm-project/issues/52976

This shows some other failure scenarios

https://github.com/llvm/llvm-project/issues/45895

I kind of think those other failures might be better handled in other reviews? what do others think?

This revision is now accepted and ready to land.Jan 7 2022, 1:35 AM
HRESULT
Foo::func()
{
}

becomes

HRESULT

Foo::func() {}

Some nits only, otherwise no other comments from my part.

Its definitely a lot better (LGTM), as the original changes have been committed I might suggest we land these changes to get us much closer. (and reduce the churn and comments on this review)

+1

This shows some other failure scenarios

https://github.com/llvm/llvm-project/issues/45895

I kind of think those other failures might be better handled in other reviews? what do others think?

+1

When handling those, please split into separate issues.

clang/lib/Format/DefinitionBlockSeparator.cpp
116
142
clang/unittests/Format/DefinitionBlockSeparatorTest.cpp
63

Unrelated. Please move to another patch.

171

+1

ksyx added a comment.Jan 7 2022, 4:07 AM

This shows some other failure scenarios

https://github.com/llvm/llvm-project/issues/45895

If I am correct, the only one unfixed in this issue would be this improvement:

With AllowShortFunctionsOnASingleLine: Inline, we sometimes group such inline functions in class definitions together

ksyx updated this revision to Diff 398159.Jan 7 2022, 8:24 AM
ksyx marked 5 inline comments as done.
  • Use prefix inc/dec
  • Not run test in lambda
ksyx added a comment.Jan 7 2022, 8:25 AM
HRESULT
Foo::func()
{
}

becomes

HRESULT

Foo::func() {}

Not reproduced with the following code and command line:

int
foo(int x, int y)
{
  int r = x*y;
  return r;
}
void
bar()
{

}
bin/clang-format test.cpp -style='{AlwaysBreakAfterReturnType: All, SeparateDefinitionBlocks: Always, BreakBeforeBraces: Allman}'

But using HRESULT return type reproduced this, with -debug option it shows it as an identifier while this is not for primitive types like void.

ksyx updated this revision to Diff 398160.Jan 7 2022, 8:27 AM

rebase

Seem related to the number of characters used in the type?

void foo()
{}

LRES
Foo::func() {}

LRESU

Foo::func() {}
ksyx added a comment.Jan 7 2022, 8:43 AM

Seem related to the number of characters used in the type?

void foo()
{}

LRES
Foo::func() {}

LRESU

Foo::func() {}
----
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=identifier L=4 PPK=2 FakeLParens= FakeRParens=0 II=0x1592cb8 Text='LRES'
 M=1 C=1 T=FunctionDeclarationName S=1 F=0 B=0 BK=0 P=80 Name=identifier L=84 PPK=2 FakeLParens= FakeRParens=0 II=0x1592cf0 Text='barbabararr'
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=23 Name=l_paren L=85 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='('
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=140 Name=r_paren L=86 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=')'
----

============================

----
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=identifier L=5 PPK=2 FakeLParens= FakeRParens=0 II=0x2e8dcb8 Text='LRESU'
----
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=identifier L=11 PPK=2 FakeLParens= FakeRParens=0 II=0x2e8dcf0 Text='barbabararr'
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=23 Name=l_paren L=12 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='('
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=140 Name=r_paren L=13 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=')'
----

============================

----
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=void L=4 PPK=2 FakeLParens= FakeRParens=0 II=0x15a0618 Text='void'
 M=1 C=1 T=FunctionDeclarationName S=1 F=0 B=0 BK=0 P=80 Name=identifier L=84 PPK=2 FakeLParens= FakeRParens=0 II=0x15a9cc0 Text='barbabararr'
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=23 Name=l_paren L=85 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='('
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=140 Name=r_paren L=86 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=')'
----

What I observed from the debug output is that it has been split into two blocks for long type name in AnnotatedToken info, while the shorter one only have one block.

....
From UnwrappedLineParser.cpp

if (FollowedByNewline && (Text.size() >= 5 || FunctionLike) &&
            tokenCanStartNewLine(*FormatTok) && Text == Text.upper()) {
          addUnwrappedLine();
          return;
        }
ksyx added a comment.Jan 7 2022, 8:50 AM

https://github.com/llvm/llvm-project/commit/680b09ba8825311dc0fad69ed166a21e3f83bcbe

Seems like this is intent for dealing with macro but somehow went into return type

Without the SeparateDefinitionBlocks: Always the issue doesn't occur, this is unclear to me why

double
Func() {}

LRES
Foo::func() {}

LRESU
Foo::func() {}

I think that perhaps

if (Style.SeparateDefinitionBlocks != FormatStyle::SDS_Leave)
  Passes.emplace_back([&](const Environment &Env) {
    return DefinitionBlockSeparator(Env, Expanded).process();
  });

Should be moved to After the Formatter pass, thoughts?

ksyx added a comment.Jan 7 2022, 9:40 AM

I think that perhaps

if (Style.SeparateDefinitionBlocks != FormatStyle::SDS_Leave)
  Passes.emplace_back([&](const Environment &Env) {
    return DefinitionBlockSeparator(Env, Expanded).process();
  });

Should be moved to After the Formatter pass, thoughts?

It does not seem to be working. The problem could be that when I am asking about Line->MightBeFunctionDecl && Line->mightBeFunctionDefinition(), it will just return false when it is a single annotated line like LRESULT and return true for the line foo() following, so the program adds a new line above this "true function definition line". However, those like LRES foo() { would surely be treated as a normal definition line.

How about checking the line before in "LikelyDefinition", I may not sure I totally understand your logic... but

if (PreviousLine && !PreviousLine->endsWith(tok::semi) &&
        PreviousLine->startsWith(tok::identifier))
      return false;

So something like this

auto LikelyDefinition = [this](const AnnotatedLine *Line,
                               const AnnotatedLine *PreviousLine) {
  if (PreviousLine && !PreviousLine->endsWith(tok::semi) &&
      PreviousLine->startsWith(tok::identifier))
    return false;
  if ((Line->MightBeFunctionDecl && Line->mightBeFunctionDefinition()) ||
      Line->startsWithNamespace())
    return true;

  FormatToken *CurrentToken = Line->First;
  while (CurrentToken) {
    if (CurrentToken->isOneOf(tok::kw_class, tok::kw_struct, tok::kw_enum) ||
        (Style.isJavaScript() && CurrentToken->TokenText == "function"))
      return true;
    CurrentToken = CurrentToken->Next;
  }
  return false;
};

This fixed this capitialized return type > 5 chars (which for people using clang-format to format windows code with DWORD,DWORD_PTR,DOUBLE,HRESULT,LRESULT this would be pretty bad.

ksyx added a comment.Jan 7 2022, 11:22 AM

I see your point, but I think the problem is more likely to be coming from the token annotator that detected it as a macro and just start a new line?

https://github.com/llvm/llvm-project/commit/680b09ba8825311dc0fad69ed166a21e3f83bcbe

ksyx added a comment.Jan 10 2022, 8:10 AM

Hi, just to confirm, are there anything needed to be improved for this patch or it is ready to be committed? Thanks

clang/lib/Format/DefinitionBlockSeparator.cpp
165

Otherwise it is really not clear what that zero should mean.

ksyx updated this revision to Diff 398719.Jan 10 2022, 12:30 PM
ksyx marked an inline comment as done.
  • add comments for direction = 0
  • use true/false instead of 0/1
  • rebase
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJan 11 2022, 9:25 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript