This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] [PR44189] Remove space between identifier and scope resolution operator unless the identifier is a macro
Needs RevisionPublic

Authored by Bouska on Jan 4 2020, 3:06 PM.

Details

Summary

Currently, if there is space between an identifier and a :: operator in the original code, that space is kept in case the identifier is a macro that isn't part of the following nested name specifier. The issue is for the identifiers that are part of the nested name specifier.

I tried to find an heuristic to determine if a :: operator was a global specifier, but that is impossible with only syntactical analysis. Instead, I went for a solution creating a option containing all the macro identifiers that shouldn't not be merge to the following nested name specifier. Usually, those macros are hiding keywords (and attributes) so I called the option KeywordMacros but it might be a bad name. If an identifier is not listed in that list, it is assumed that it is part of the following nested name specifier and merged with it.

It's still a work in progress. Function declaration like this

int ::test::test()

seems to be legal (but not if the return type has a nested name specifier), but it's not handled by my change. I tried checking if the line is MightBeFunctionDecl and the identifier is the first token of the line but that doesn't work.

This patch fixes bug #44189

Diff Detail

Event Timeline

Bouska created this revision.Jan 4 2020, 3:06 PM
MyDeveloperDay requested changes to this revision.Jan 5 2020, 8:24 AM

is #44192 the correct bug?

clang/include/clang/Format/Format.h
1308

You need generate the change to the documentation and the release notes

clang/unittests/Format/FormatTest.cpp
165

what happens if you don't have the KeywordMacro? what would the reformatting look like?

I'm a little concerned that alot of code could change with this change if people don't go around adding all their KeywordMacros, have you tried this on a large code base like LLVM to understand the impact?

This revision now requires changes to proceed.Jan 5 2020, 8:24 AM
Bouska edited the summary of this revision. (Show Details)Jan 5 2020, 9:25 AM
Bouska marked an inline comment as done.Jan 5 2020, 11:26 AM
Bouska added inline comments.
clang/unittests/Format/FormatTest.cpp
165

It would look like this:

ALWAYS_INLINE::std::string getName();

(cf bug 31951)

Basically, we are going back to pre clang 4.0 behaviour. I checked on LLVM code base (checked manually so definitely not exhaustive), and so far I only found two changes on buggy syntax (a space inside a nested name specifier that shouldn't be there) and also a bug (no related to me change and still have to report it), so that is pretty positive. I should try on a Google code base as the initial bug was reported by someone from there.

is #44192 the correct bug?

My bad, I made a bad copy paste. Correct bug is #44189 (like in the title). Fixed the summary.

MyDeveloperDay added inline comments.Jan 7 2020, 12:41 PM
clang/lib/Format/TokenAnnotator.cpp
3042

I'm a bit confused by the example in your bug report. I don't think the original change is the problem but more that you are using the word "class" which is not an tok::identifier but a tok::class

if in your example it had been

int A::test()

Then I think it wouldn't have broken it into int A ::test()

I'm sorry but I feel this mechanism is MUCH more of a problem for everyone else, and could generate much more of a problem (not to mention revert the original bug unless all macros are put back.)

I don't want to discourage you, perhaps I don't truly understand, but something doesn't feel quite right, maybe if you added more tests showing the problem then it would be easier

clang/unittests/Format/FormatTest.cpp
165

I think by definition that's going to be a regression if we do this... we have to find a better way.

I really do NOT like changing existing tests, it just leaves a bad taste in my mouth, and from my experience just ends up with a change being reverted

Bouska marked an inline comment as done.Jan 7 2020, 2:48 PM
Bouska added inline comments.
clang/lib/Format/TokenAnnotator.cpp
3042

My example was bad, I put a new one.

My issue is I that have a code base where some methods are with this syntax:

int MyClass :: MyFunction()

so with a space before and after the :: operator, which I don't want. Currently, the functions spaceRequiredBefore() & spaceRequiredBetween(), remove the space after the :: operator (line 2653) and keep the space before if there is already a space (line 3045, the one I'm changing). So I end up with the following:

int MyClass ::MyFunction()

Before clang 4.0, the space before the :: operator was removed if it followed an identifier. As reported by bug 31951, this is a bad behaviour in some cases as the :: operator could be global specifier and we don't want to merge it with preceding identifier (most probably a macro as stated in comment line 3042).

Using the :: operator as a global specifier is pretty uncommon even more to be following a identifier (I found 0 occurrence in LLVM code base). So my idea is to go back to pre clang 4.0 behaviour but to give an option to add that space for the users that need it. Basically, I'm removing a regression (from clang pre 4) I can do nothing about (apart from fixing manually my code), and adding a regression that can be easily fixed by adding an option parameter. I would prefer a better solution but for now I didn't found it.

Bouska marked an inline comment as done.Jan 7 2020, 3:13 PM
Bouska added inline comments.
clang/unittests/Format/FormatTest.cpp
165

Indeed, it's going to create a regression. The only way to mitigate it is to add to the default config the most used macros but it still wouldn't be perfect.

The issue is that clang-format cannot distinguish the two following function declaration:

ALWAYS_INLINE ::std::string CONST_MACRO ::test::function()
ns::subns::string ns::subns::function()

unless the developer tells clang-format. The current way of telling clang-format is just to keep a space if there is already one, the way I'm proposing is to add a option listing every identifier clang-format has to put a space. Both of methods has its caveats.

MyDeveloperDay added inline comments.Jan 7 2020, 3:34 PM
clang/unittests/Format/FormatTest.cpp
165

the fact that the current behaviour keeps the space if there is one and doesn't add one if there isn't, feels IMHO to be more correct. I don't think that potentially a large number of users need to add a large number of macros just to keep current behaviour is a good idea.

I'm really just a regular contributor, so if you feel this still has merits, then I recommend appealing to a higher authority like @klimek or @djasper

klimek added inline comments.Jan 8 2020, 1:00 AM
clang/unittests/Format/FormatTest.cpp
165

Agreed. I think the workaround is to use a sed script once on the code base :(

Bouska marked an inline comment as done.Jan 11 2020, 5:02 AM
Bouska added inline comments.
clang/unittests/Format/FormatTest.cpp
165

I checked LLVM code and Chromium code for this situation (an attribute macro in front of a global specifier), and I found 0 occurrence in LLVM and only one in Chromium, that is why I proposed this patch. But I get that a regression is a regression. I'll improve a bit this revision (so it's in a usable stable if someone want to cherry pick it) and I'll close this review as I see no way to implement this change without adding a regression.

Could the behaviour be something like this: if the user didn't specify the KeywordMacros (it's empty), then keep the current behaviour (add a space). This avoids introducing a regression.
If the option specified (possibly with a dummy macro name of you wanna to enable it unconditionally), then consider this list as exhaustive and put a space only after these macros. WDYT? @Bouska