This is an archive of the discontinued LLVM Phabricator instance.

clang-format: better handle statement macros
ClosedPublic

Authored by Typz on May 23 2017, 7:27 AM.

Details

Summary

Some macros are used in the body of function, and actually contain the trailing semicolon: they should thus be automatically followed by a new line, and not get merged with the next line. This is for example the case with Qt's Q_UNUSED macro:

  
void foo(int a, int b) {
  Q_UNUSED(a)
  return b;
}

This patch deals with these cases by introducing a new option to specify list of statement macros. This re-uses the system already in place for foreach macros, to ensure there is no impact on performance.

Diff Detail

Repository
rL LLVM

Event Timeline

Typz created this revision.May 23 2017, 7:27 AM
djasper edited edge metadata.May 24 2017, 4:52 AM

clang-format already has logic to detect semicolon-less macro invocations an in fact this already does behave as I would expect. What are you fixing?

Typz added a comment.May 24 2017, 5:04 AM

Without this patch, macros with no trailing semicolon _in the body of a function_ are not handled properly, so I get:

int foo(int a, int b) {
  Q_UNUSED(a) return b;
}

class Foo {
  void bar(int a, int b) { Q_UNUSED(a) Q_UNUSED(b) }
}

I don't. Only if they start out to be on the same line. As long as I start with:

class C {
  void foo(int a, int b) {
    Q_UNUSED(a)
    Q_UNUSED(a)
    return b;
  }
};

clang-format leaves this alone. That's good enough I think and we don't want to add more special handling for macro-DSLs than strictly necessary.

Typz added a comment.May 24 2017, 5:47 AM

Digging back, it seem the issue was that I had this code:

void foo(......) { Q_UNUSED(a) Q_UNUSED(b) }

which got wrapped because the line was too long:

void foo(......) {
  Q_UNUSED(a) Q_UNUSED(b)
}

I definitely understand your concern about introduceing special handling for some specific/hard-coded macro-DSLs, but would it be acceptable to add a configuration option to specify a list of such macros? e.g. a StatementMacros option, similar to the ForEachMacros option?

That would allow:

  • Systematically reformatting the first example
  • If such a code is reformatted, ensure the 2 macros are not on the same line

I generally would not be opposed to such a patch. However, note that this might be hard to get right. We had significant performance problems in the past with ForEachMacros as we used to match every single identifier against the regex stored in there. For for loops you can somewhat get out of that and you might be able to do the same thing here, but I am not entirely sure. In contrast, the added value is actually not very large. clang-format is merely not able to automatically fix something to your liking and it's very easy to make the code right and have clang-format keep it that way.

Typz updated this revision to Diff 103935.Jun 26 2017, 6:15 AM

Complete refactor to make the processing much more generic

Typz retitled this revision from clang-format: properly handle Q_UNUSED and QT_REQUIRE_VERSION to clang-format: better handle statement and namespace macros.Jun 26 2017, 6:18 AM
Typz edited the summary of this revision. (Show Details)
Typz edited the summary of this revision. (Show Details)
Typz updated this revision to Diff 103937.Jun 26 2017, 6:18 AM

Fix typo

So, there are two things in this patch: Statement macros and namespace macros. Lets break this out and handle them individually. They really aren't related that much.

Statement macros:
I think clang-format's handling here is good enough. clang-format does not insert the line break, but it also doesn't remove it. I am not 100% sure here, so I an be convinced. But I want to understand the use cases better. Do you expect people to run into this frequently? I am essentially trying to understand whether the cost of an extra option is worth the benefit it is giving.

Namespace macros:
How important are the automatic closing comments to you? I'd say that we should punt on that and leave it to the user to fix comments of these. And then, we could try to make the things we already have in MacroBlockBegin detect whether it ends with an opening brace and not need an extra list here. What do you think?

unittests/Format/FormatTest.cpp
1483 ↗(On Diff #103937)

What's the difference here?

Typz added a comment.Jul 18 2017, 6:18 AM

t>>! In D33440#812645, @djasper wrote:

So, there are two things in this patch: Statement macros and namespace macros. Lets break this out and handle them individually. They really aren't related that much.

Indeed, the only "relation" is the implementation, which now uses the exact same list (internally) to match all macros. Phabricator makes it very difficult to work with related changes pushed as multiple reviews, so I ended up merging the two features.

Statement macros:
I think clang-format's handling here is good enough. clang-format does not insert the line break, but it also doesn't remove it. I am not 100% sure here, so I an be convinced. But I want to understand the use cases better. Do you expect people to run into this frequently? I am essentially trying to understand whether the cost of an extra option is worth the benefit it is giving.

This happens relatively often, for example when fixing "unused parameter warning" on an inlined function: `int f(int a) { return 0; } often gets fixed to int f(int a) { Q_UNUSED(a) return 0; }` and clang-format does not fix the formatting...

Namespace macros:
How important are the automatic closing comments to you? I'd say that we should punt on that and leave it to the user to fix comments of these. And then, we could try to make the things we already have in MacroBlockBegin detect whether it ends with an opening brace and not need an extra list here. What do you think?

This is not just about automatic closing comments, there are may differences: indentation of namespaces, 'compacting' of namespaces when CompactNamespaces is used, detection of inline functions (for putting on a single line with SFS_Inline), handling of empty blocks (i.e. use BraceWrappingFlags.SplitEmptyNamespace)...

Typz updated this revision to Diff 114783.Sep 12 2017, 2:05 AM

Rebase to master to fix merge issue

I'd still prefer individual patches for each of these changes. If the code review system or VCS make it hard for you to deal with two adjacent changes this way, do them in sequence.

Adding Manuel as a reviewer who has a longer term idea on how to handle macros.

krasimir edited edge metadata.Sep 13 2017, 3:30 AM

Patch looks good, but I also would like to see it splited. I would suggest to first get the statement macro part in, which requires less code. Then we can put the namespace macros on top of that. I really like the generality of this approach and would want to also add support for class macros eventually.

lib/Format/FormatTokenLexer.cpp
641 ↗(On Diff #114783)

Please move this inside the following if statement, so that we only perform the search when we see a tok::pp_define.

lib/Format/NamespaceEndCommentsFixer.cpp
56 ↗(On Diff #114783)

I don't understand why you have a Tok ? ... after you assert(Tok && ...)?

82 ↗(On Diff #114783)

What does this comment refer to? If it's about the line above, consider moving it up.

100 ↗(On Diff #114783)

nit: end comment with .

105 ↗(On Diff #114783)

nit: end comment with .

155 ↗(On Diff #114783)

What happened to the old // Detect "(inline)? namespace" in the beginning of a line.

unittests/Format/FormatTest.cpp
1588 ↗(On Diff #114783)

Hm, what would happen if you have a namespace macro with two or more parameters?

Typz marked 8 inline comments as done.Sep 13 2017, 8:49 AM
Typz added inline comments.
lib/Format/NamespaceEndCommentsFixer.cpp
155 ↗(On Diff #114783)

Moved to FormatToken::getNamespaceToken()

unittests/Format/FormatTest.cpp
1588 ↗(On Diff #114783)

only the first argument is used, as seen in previous test case.

Typz updated this revision to Diff 115054.Sep 13 2017, 8:58 AM
Typz marked 2 inline comments as done.

Fix review comments, before splitting the commit.

Typz updated this revision to Diff 115057.Sep 13 2017, 9:23 AM

Split diff: handle only statements in here, namespace macros will be moved to another one.

Typz retitled this revision from clang-format: better handle statement and namespace macros to clang-format: better handle statement macros.Sep 13 2017, 9:29 AM
Typz edited the summary of this revision. (Show Details)
krasimir added inline comments.Sep 14 2017, 1:00 AM
unittests/Format/FormatTest.cpp
2407 ↗(On Diff #115057)

Please add tests where:

  • the macro occurs inside a function body, thus having nontrivial indentation
  • there are two macros one after another
  • there is some code before the macro (on the same line + on a previous line with the same level)
djasper added inline comments.Sep 14 2017, 2:29 AM
lib/Format/FormatTokenLexer.cpp
642 ↗(On Diff #115057)

This does a binary search. Why aren't you implementing it with a hashtable?

lib/Format/FormatTokenLexer.h
101 ↗(On Diff #115057)

What does this class do and why do we need it? Describe it's purpose in a comment.

102 ↗(On Diff #115057)

nullptr

109 ↗(On Diff #115057)

Are all of these used?

Typz updated this revision to Diff 115801.Sep 19 2017, 1:03 AM
Typz marked 5 inline comments as done.

Add tests.
Replace sorted list with hashtable.

lib/Format/FormatTokenLexer.cpp
642 ↗(On Diff #115057)

It was already done this way, so I did not change it to avoid any impact on performance.
But I can change it if you prefer.

Out of curiosity, will this be able to fix the two situations that you get for python extension?
There, you usually have a PyObject_HEAD with out semicolon in a struct and than a PyObject_HEAD_INIT(..) in a braced init list. More info:
http://starship.python.net/crew/mwh/toext/node20.html

lib/Format/FormatTokenLexer.cpp
642 ↗(On Diff #115057)

Thanks. This is much better than it was before.

630 ↗(On Diff #120084)

Just do:

auto it = Macros.find(FormatTok->Tok.getIdentifierInfo());
..

I know that this means that we might do the hash look up when we wouldn't need to (when we are actually in a #define), but I think clarity here is more important than the tiny performance benefit.

lib/Format/FormatTokenLexer.h
25 ↗(On Diff #120084)

Use ".

lib/Format/UnwrappedLineParser.cpp
1254 ↗(On Diff #120084)

This contains the exact same code (I think). Can you pull it out into a function?

Typz added a comment.Nov 9 2017, 5:00 AM

Out of curiosity, will this be able to fix the two situations that you get for python extension?
There, you usually have a PyObject_HEAD with out semicolon in a struct and than a PyObject_HEAD_INIT(..) in a braced init list. More info:
http://starship.python.net/crew/mwh/toext/node20.html

PyObject_HEAD is defined (by defaut) as:

Py_ssize_t ob_refcnt;
PyTypeObject *ob_type;

so declaring it as a statement macro should allow clang-format to process it correctly. (though this is a variable-like macro, so I need to check if this case is supported)

PyObject_HEAD_INIT ends with a comma, which is not supported yet. We would need to add another kind of macro to support that case.

Typz updated this revision to Diff 122228.Nov 9 2017, 5:38 AM
Typz marked 5 inline comments as done.

Address review comments

Typz marked an inline comment as done.Nov 9 2017, 5:41 AM
Typz updated this revision to Diff 132837.Feb 5 2018, 8:47 AM

Use StatementMacro detection to improve brace type detection heuristics (in UnwrappedLineParser::calculateBraceTypes).

alexfh added a subscriber: alexfh.Mar 23 2018, 5:14 AM
alexfh added inline comments.
lib/Format/Format.cpp
647–648 ↗(On Diff #132837)

What's the reason to have these in the LLVM style? The macros aren't used in LLVM code.

unittests/Format/FormatTest.cpp
2424 ↗(On Diff #132837)

nit: Add a trailing period.

Typz updated this revision to Diff 147024.May 16 2018, 2:38 AM

Rebase on latest master

There are still outstanding comments.

Typz updated this revision to Diff 147293.May 17 2018, 4:51 AM
Typz marked an inline comment as done.

Address review comments

Typz marked an inline comment as done.May 17 2018, 4:51 AM
Typz added inline comments.
lib/Format/Format.cpp
647–648 ↗(On Diff #132837)

This is similar to the default foreach macros (foreach, Q_FOREACH and BOOST_FOREACH) : the macros are added here so that they are the default values for any style, since LLVM style is also both the default style and the base style for any style.

Typz updated this revision to Diff 158308.Jul 31 2018, 9:50 AM
Typz marked an inline comment as done.

Regenerate documentation

krasimir accepted this revision.Sep 18 2018, 2:50 AM
This revision is now accepted and ready to land.Sep 18 2018, 2:50 AM
This revision was automatically updated to reflect the committed changes.

@Typz
I think this should be part of the release notes for v8.
This is changing the output on some code base and this is a new feature.

By the way, an example in the doc would be great (including with the configuration)

merci!

Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2019, 9:57 PM