This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Add a NamespaceEndCommentsFixer
ClosedPublic

Authored by krasimir on Feb 22 2017, 12:47 PM.

Details

Summary

This patch adds a NamespaceEndCommentsFixer TokenAnalyzer for clang-format,
which fixes end namespace comments.
It currently supports inserting and updating existing wrong comments.

Example source:

namespace A {
int i;
}

namespace B {
int j;
} // namespace A

after formatting:

namespace A {
int i;
} // namespace A

namespace B {
int j;
} // namespace B

Event Timeline

krasimir created this revision.Feb 22 2017, 12:47 PM
krasimir edited the summary of this revision. (Show Details)Feb 22 2017, 12:49 PM
krasimir added reviewers: djasper, klimek.
krasimir updated this revision to Diff 89413.Feb 22 2017, 12:55 PM
  • Reformatted unit tests
djasper edited edge metadata.Feb 23 2017, 12:23 AM

I started reviewing, but after a while it occurred to me that some of this might not be the right approach. We are essentially duplicating the parsing of namespaces, which is already done in UnwrappedLineParser. Lets instead just store more information in the AnnotatedLines (or FormatTokens) to carry the required information forward.

Specifically, I think we should do two things:

  1. Change UnwrappedLineFormatter::parseNamespace() to carry more information of the parse namespace.
  2. Change UnwrappedLineFormatter entirely to compute and store information about matched braces. This can (I think) be store in the AnnotatedLines as we have pairs of lines that belong together.
lib/Format/NamespaceEndCommentsFixer.cpp
38

Why do you match so precisely? Why not just return all the non-comment tokens between "namespace" and "{"? What should happen if a namespace declaration doesn't match?

110

I think you should remove SpaceBefore. clang-format will subsequently clean this up. Or, alternatively, you should use clang-format Style option SpacesBeforeTrailingComments.

111

I think it's more efficient to construct strings like this with llvm::Twine

116

No braces for single-statement ifs.

184

I think we should always update with a //-comment

197

I'd find the control flow easier to read as:

if (!hasEndComment()) {
  addEndComment(Fixes);
  return;
}
if (!validEndComment()) {
  updateEndComment(Fixes);
}
230

This needs the comment. Also, why "append" and why don't you return a vector of namespace?

232

Why unique_ptr<Namespace> instead of just Namespace?

klimek added inline comments.Feb 23 2017, 12:38 AM
lib/Format/NamespaceEndCommentsFixer.cpp
111

Generally, you don't want to use Twine outside of a single statement.
Twine a = Twine("a") + Twine("b");
is a use-after-free, because 'a' keeps a pointer to the temporaries.

krasimir updated this revision to Diff 89487.Feb 23 2017, 2:46 AM
krasimir marked 5 inline comments as done.
  • Address review comments

@djasper: I don't entirely understand:

  1. Change UnwrappedLineFormatter entirely to compute and store information about matched braces. This can (I think) be store in the AnnotatedLines as we have pairs of lines that belong together.

Does that mean that an unwrapped line cannot contain more than one brace? I'm also not sure how we can test such a change.

An unwrapped line can contain many braces, but (I think/hope) only one that belongs to a non-nested block. So every unwrapped line that opens a namespace/class/function/enum/... should have exactly one counterpart that closes it again.

Now, there are nested blocks, e.g. in lambdas, but for those, it's trivial because the opening and closing braces will be right next to each other (and also we can completely ignore for namespaces).

You can write a TestTokenAnalyzer to get the UnwrappedLines for a snippet of code and write unit tests using that.

Also a line may both close and open a block, as the } else { line in:

if (c()) {
  a();
} else {
  b();
}

I still need a bit of explanation. Here's my reasoning:

  1. Now, FormatToken already has a member MatchingParen, so the most straightforward thing would be to store the information at the FormatToken level.

However, that would directly link tokens from different unwrapped lines, which is not a sound design, because we want to keep unwrapped lines relatively separated?

  1. For this reason it might be better to store the matching info in the unwrapped line?

Is this correct?

krasimir updated this revision to Diff 89667.Feb 24 2017, 6:46 AM
  • Make the parser do the matching of begin/end line blocks
krasimir updated this revision to Diff 89668.Feb 24 2017, 6:55 AM
  • Make the namespace name matcher simpler
krasimir marked an inline comment as done.Feb 24 2017, 6:55 AM

I think this is ready for another range of reviews!

And of course, the remaining question is how exactly do we integrate this into clang-format. Do we make the namespace fixer always fix namespace comments during reformat()?

krasimir updated this revision to Diff 89853.Feb 27 2017, 2:39 AM
  • Add tests for macros
djasper added inline comments.Feb 27 2017, 2:44 AM
lib/Format/NamespaceEndCommentsFixer.cpp
35

I don't think you'll ever need the actual tokens. How about just returning the name here?

58

I’d leave the entire CRLF handling out. We must not replicate this N times. For now, I’d rely on the subsequent formatting step to tidy this up and if there are reasons not to do that, we’d want to have this functionality in WhitespaceManager or such.

82

I think you can also have anonymous namespaces that require a subsequent newline, so returning earlier is probably wrong.

163

Don't test this here. Only call fixEndComment when you actually know you want to fix.

165

Similarly, don’t test isShort here, test it outside.

187

I'd write this as:

if (!AnnotatedLines[i].Affected || !AnnotatedLines[i].startsWith(tok::r_brace))
  continue;
const AnnotatedLine *EndLine = AnnotatedLines[i];
size_t StartLineIndex = EndLine->MatchingOpeningBlockLine;
if (StartLineIndex == UnwrappedLine::kInvalidIndex)
  continue;
assert(StartLineIndex < E);
const AnnotatedLine *StartLine = AnnotatedLines[StartLineIndex];
if (!StartLine->startsWith(tok::kw_namespace) &&
    !StartLine->startsWith(tok::kw_inline, tok::kw_namespace))
  continue;

...
211

So, what happens if (I+1 >= E)? Wouldn’t you then have an uninitialized value in NextAfterEndComment that you then dereference? The only thing you are using this for is to compute whether or not you need a newline. I’d compute this here. Then all of the other functions can be entirely ignorant about the next line. The design principle I’d use here is to pass a long as little information as necessary.

213

Once you have done most of the above, I think the entire structure NamespaceBlock becomes unnecessary. All you really need in fixEndComment is:

  • The namespace name
  • The r_brace token
  • Whether or not you need to add a newline
krasimir updated this revision to Diff 89867.Feb 27 2017, 5:21 AM
krasimir marked 8 inline comments as done.
  • Remove NamespaceBlock and address review comments
lib/Format/NamespaceEndCommentsFixer.cpp
165

I can't really test for isShort outside because I still want to update wrong namespace end comments for short namespaces.
The combination of your two comments has the affect to make fixComment virtually useless, I'll just inline it.

187

I'll rewrite it as you suggest except for the last if statement: I need the namespace token later for generating the name, so I'll have to skip over the namespace and inline anyways.

213

plus the namespace token to determine if the block is short. I guess I'll just inline the definition of the namespace block.

krasimir updated this revision to Diff 89868.Feb 27 2017, 5:25 AM
  • Removed redundant variable StartLine

I think this is ready for another round of reviews. Sorry for the overly complicated design.

djasper accepted this revision.Feb 27 2017, 5:26 AM

Looks good.

About hooking it up to clang-format: I suggest a boolean flag for now to enable/disable it.

lib/Format/NamespaceEndCommentsFixer.cpp
48

Remove

This revision is now accepted and ready to land.Feb 27 2017, 5:26 AM

Do we want this on or off by default?

On by default for LLVM and Google style, don't know about the others.

krasimir updated this revision to Diff 89870.Feb 27 2017, 5:37 AM
  • Remove redundant semicolon
This revision was automatically updated to reflect the committed changes.