clang-format: better handle statement macros
Needs ReviewPublic

Authored by Typz on May 23 2017, 7:27 AM.

Details

Summary

Some macros are used in the body of function, and actually contain the trailing semicolon: they should thus be automatically followed by a new line, and not get merged with the next line. This is for example the case with Qt's Q_UNUSED macro:

  
void foo(int a, int b) {
  Q_UNUSED(a)
  return b;
}

This patch deals with these cases by introducing a new option to specify list of statement macros. This re-uses the system already in place for foreach macros, to ensure there is no impact on performance.

Typz created this revision.May 23 2017, 7:27 AM

clang-format already has logic to detect semicolon-less macro invocations an in fact this already does behave as I would expect. What are you fixing?

Typz added a comment.May 24 2017, 5:04 AM

Without this patch, macros with no trailing semicolon _in the body of a function_ are not handled properly, so I get:

int foo(int a, int b) {
  Q_UNUSED(a) return b;
}

class Foo {
  void bar(int a, int b) { Q_UNUSED(a) Q_UNUSED(b) }
}

I don't. Only if they start out to be on the same line. As long as I start with:

class C {
  void foo(int a, int b) {
    Q_UNUSED(a)
    Q_UNUSED(a)
    return b;
  }
};

clang-format leaves this alone. That's good enough I think and we don't want to add more special handling for macro-DSLs than strictly necessary.

Typz added a comment.May 24 2017, 5:47 AM

Digging back, it seem the issue was that I had this code:

void foo(......) { Q_UNUSED(a) Q_UNUSED(b) }

which got wrapped because the line was too long:

void foo(......) {
  Q_UNUSED(a) Q_UNUSED(b)
}

I definitely understand your concern about introduceing special handling for some specific/hard-coded macro-DSLs, but would it be acceptable to add a configuration option to specify a list of such macros? e.g. a StatementMacros option, similar to the ForEachMacros option?

That would allow:

  • Systematically reformatting the first example
  • If such a code is reformatted, ensure the 2 macros are not on the same line

I generally would not be opposed to such a patch. However, note that this might be hard to get right. We had significant performance problems in the past with ForEachMacros as we used to match every single identifier against the regex stored in there. For for loops you can somewhat get out of that and you might be able to do the same thing here, but I am not entirely sure. In contrast, the added value is actually not very large. clang-format is merely not able to automatically fix something to your liking and it's very easy to make the code right and have clang-format keep it that way.

Typz updated this revision to Diff 103935.Jun 26 2017, 6:15 AM

Complete refactor to make the processing much more generic

Typz retitled this revision from clang-format: properly handle Q_UNUSED and QT_REQUIRE_VERSION to clang-format: better handle statement and namespace macros.Jun 26 2017, 6:18 AM
Typz edited the summary of this revision. (Show Details)
Typz edited the summary of this revision. (Show Details)
Typz updated this revision to Diff 103937.Jun 26 2017, 6:18 AM

Fix typo

So, there are two things in this patch: Statement macros and namespace macros. Lets break this out and handle them individually. They really aren't related that much.

Statement macros:
I think clang-format's handling here is good enough. clang-format does not insert the line break, but it also doesn't remove it. I am not 100% sure here, so I an be convinced. But I want to understand the use cases better. Do you expect people to run into this frequently? I am essentially trying to understand whether the cost of an extra option is worth the benefit it is giving.

Namespace macros:
How important are the automatic closing comments to you? I'd say that we should punt on that and leave it to the user to fix comments of these. And then, we could try to make the things we already have in MacroBlockBegin detect whether it ends with an opening brace and not need an extra list here. What do you think?

unittests/Format/FormatTest.cpp
1588

What's the difference here?

Typz added a comment.Jul 18 2017, 6:18 AM

t>>! In D33440#812645, @djasper wrote:

So, there are two things in this patch: Statement macros and namespace macros. Lets break this out and handle them individually. They really aren't related that much.

Indeed, the only "relation" is the implementation, which now uses the exact same list (internally) to match all macros. Phabricator makes it very difficult to work with related changes pushed as multiple reviews, so I ended up merging the two features.

Statement macros:
I think clang-format's handling here is good enough. clang-format does not insert the line break, but it also doesn't remove it. I am not 100% sure here, so I an be convinced. But I want to understand the use cases better. Do you expect people to run into this frequently? I am essentially trying to understand whether the cost of an extra option is worth the benefit it is giving.

This happens relatively often, for example when fixing "unused parameter warning" on an inlined function: `int f(int a) { return 0; } often gets fixed to int f(int a) { Q_UNUSED(a) return 0; }` and clang-format does not fix the formatting...

Namespace macros:
How important are the automatic closing comments to you? I'd say that we should punt on that and leave it to the user to fix comments of these. And then, we could try to make the things we already have in MacroBlockBegin detect whether it ends with an opening brace and not need an extra list here. What do you think?

This is not just about automatic closing comments, there are may differences: indentation of namespaces, 'compacting' of namespaces when CompactNamespaces is used, detection of inline functions (for putting on a single line with SFS_Inline), handling of empty blocks (i.e. use BraceWrappingFlags.SplitEmptyNamespace)...

Typz updated this revision to Diff 114783.Sep 12 2017, 2:05 AM

Rebase to master to fix merge issue

I'd still prefer individual patches for each of these changes. If the code review system or VCS make it hard for you to deal with two adjacent changes this way, do them in sequence.

Adding Manuel as a reviewer who has a longer term idea on how to handle macros.

Patch looks good, but I also would like to see it splited. I would suggest to first get the statement macro part in, which requires less code. Then we can put the namespace macros on top of that. I really like the generality of this approach and would want to also add support for class macros eventually.

lib/Format/FormatTokenLexer.cpp
631

Please move this inside the following if statement, so that we only perform the search when we see a tok::pp_define.

lib/Format/NamespaceEndCommentsFixer.cpp
56 ↗(On Diff #114783)

I don't understand why you have a Tok ? ... after you assert(Tok && ...)?

82 ↗(On Diff #114783)

What does this comment refer to? If it's about the line above, consider moving it up.

100 ↗(On Diff #114783)

nit: end comment with .

105 ↗(On Diff #114783)

nit: end comment with .

155 ↗(On Diff #114783)

What happened to the old // Detect "(inline)? namespace" in the beginning of a line.

unittests/Format/FormatTest.cpp
1588

Hm, what would happen if you have a namespace macro with two or more parameters?

Typz marked 8 inline comments as done.Sep 13 2017, 8:49 AM
Typz added inline comments.
lib/Format/NamespaceEndCommentsFixer.cpp
155 ↗(On Diff #114783)

Moved to FormatToken::getNamespaceToken()

unittests/Format/FormatTest.cpp
1588

only the first argument is used, as seen in previous test case.

Typz updated this revision to Diff 115054.Sep 13 2017, 8:58 AM
Typz marked 2 inline comments as done.

Fix review comments, before splitting the commit.

Typz updated this revision to Diff 115057.Sep 13 2017, 9:23 AM

Split diff: handle only statements in here, namespace macros will be moved to another one.

Typz retitled this revision from clang-format: better handle statement and namespace macros to clang-format: better handle statement macros.Sep 13 2017, 9:29 AM
Typz edited the summary of this revision. (Show Details)
krasimir added inline comments.Sep 14 2017, 1:00 AM
unittests/Format/FormatTest.cpp
2442

Please add tests where:

  • the macro occurs inside a function body, thus having nontrivial indentation
  • there are two macros one after another
  • there is some code before the macro (on the same line + on a previous line with the same level)
djasper added inline comments.Sep 14 2017, 2:29 AM
lib/Format/FormatTokenLexer.cpp
634–635

This does a binary search. Why aren't you implementing it with a hashtable?

lib/Format/FormatTokenLexer.h
102

What does this class do and why do we need it? Describe it's purpose in a comment.

103

nullptr

110

Are all of these used?

Typz updated this revision to Diff 115801.Sep 19 2017, 1:03 AM
Typz marked 5 inline comments as done.

Add tests.
Replace sorted list with hashtable.

lib/Format/FormatTokenLexer.cpp
634–635

It was already done this way, so I did not change it to avoid any impact on performance.
But I can change it if you prefer.

Out of curiosity, will this be able to fix the two situations that you get for python extension?
There, you usually have a PyObject_HEAD with out semicolon in a struct and than a PyObject_HEAD_INIT(..) in a braced init list. More info:
http://starship.python.net/crew/mwh/toext/node20.html

lib/Format/FormatTokenLexer.cpp
630

Just do:

auto it = Macros.find(FormatTok->Tok.getIdentifierInfo());
..

I know that this means that we might do the hash look up when we wouldn't need to (when we are actually in a #define), but I think clarity here is more important than the tiny performance benefit.

634–635

Thanks. This is much better than it was before.

lib/Format/FormatTokenLexer.h
25

Use ".

lib/Format/UnwrappedLineParser.cpp
1249

This contains the exact same code (I think). Can you pull it out into a function?

Typz added a comment.Thu, Nov 9, 5:00 AM

Out of curiosity, will this be able to fix the two situations that you get for python extension?
There, you usually have a PyObject_HEAD with out semicolon in a struct and than a PyObject_HEAD_INIT(..) in a braced init list. More info:
http://starship.python.net/crew/mwh/toext/node20.html

PyObject_HEAD is defined (by defaut) as:

Py_ssize_t ob_refcnt;
PyTypeObject *ob_type;

so declaring it as a statement macro should allow clang-format to process it correctly. (though this is a variable-like macro, so I need to check if this case is supported)

PyObject_HEAD_INIT ends with a comma, which is not supported yet. We would need to add another kind of macro to support that case.

Typz updated this revision to Diff 122228.Thu, Nov 9, 5:38 AM
Typz marked 5 inline comments as done.

Address review comments

Typz marked an inline comment as done.Thu, Nov 9, 5:41 AM