Page MenuHomePhabricator

[clang-format] Add East/West Const fixer capability
Changes PlannedPublic

Authored by MyDeveloperDay on Nov 3 2019, 7:52 AM.

Details

Summary

Developers these days seem to argue over east vs west const like they used to argue over tabs vs whitespace or the various bracing style. These previous arguments were mainly eliminated with tools like clang-format that allowed those rules to become part of your style guide. Anyone who has been using clang-format in a large team over the last couple of years knows that we don't have those religious arguments any more, and code reviews are more productive.

https://www.youtube.com/watch?v=fv--IKZFVO8
https://mariusbancila.ro/blog/2018/11/23/join-the-east-const-revolution/
https://www.youtube.com/watch?v=z6s6bacI424

The purpose of this revision is to try to do the same for the East/West const discussion. Move the debate into the style guide and leave it there!

In addition to the new ConstStyle: Right or ConstStyle: Left there is an additional command-line argument --const-style=left/right which would allow an individual developer to switch the source back and forth to their own style for editing, and back to the committed style before commit. (you could imagine an IDE might offer such a switch)

The revision works by implementing a separate pass of the Annotated lines much like the SortIncludes and then create replacements for constant type declarations.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
MyDeveloperDay retitled this revision from [clang-format] Add Left/Right Const fixer capability to [clang-format] Add East/West Const fixer capability.May 24 2020, 3:08 PM

Add more test cases
Cover more template and namespace cases

MyDeveloperDay planned changes to this revision.May 24 2020, 3:18 PM

@steveire I'm still working on this I have just one issue from your lit that is failing (see below), but I wanted to capture the other changes in the review.

const Foo<Foo<int>>* p = const_cast<const Foo<Foo<int>>*>(&ffi);
klimek added inline comments.May 25 2020, 4:24 AM
clang/docs/ClangFormatStyleOptions.rst
1378

I'd be fine with only having left/right; my personal feeling is also that west-const / east-const has kinda become a term of art, though, so I really don't know which one is "right" :)

Generally, I think this is one of the cases where, given good docs, we're quickly spending more engineering hours discussing the right solution than it'll cost aggregated over all future users, under the assumption that people do not write new configs very often, and the few who will, will quickly remember.

MyDeveloperDay added a comment.EditedMay 25 2020, 5:05 AM
Foo<const Foo<int>> P; 

// This is needed to trigger the above 'const const' bug above:
#if 0
#else
#endif

produces

Foo<Foo<int> const const> P;

This bug is caused by the way clang-format re-parses the code for each conditional that the preprocessor caused (I actually didn't know it did this, or why)

The more conditional branches we introduce the more this goes wrong...

// 'const >' inserted instead of 'const':
Foo<Foo<int> const const const const> P;

#if 0
#if 1
#else
#endif
#else
#endif

I'm not yet sure of the solution or the reason.

Handle more cases,

Currently on this case is failing, the preprocess lines are causing multiple replacements which means const gets move more than once.

verifyFormat("Foo<const Foo<int>> P;\n#if 0\n#else\n#endif", "Foo<Foo<int> const> P;\n#if 0\n#else\n#endif", Style);
MyDeveloperDay planned changes to this revision.May 25 2020, 10:12 AM

Thanks @MyDeveloperDay for hammering on on these bugs and @steveire for finding them! There's still obviously some risk here but as long as this is opt-in for a major release or two (i.e. not turned on in built-in styles, any remaining bugs should get flushed out.

Regarding option naming, I did think East/West (only) was the way to go but have reluctantly changed my mind after rereading the discussion. My conclusions were:

  • C++ people who have read discussions on this from 2018 on are likely familiar with "east const" terminology
  • there are people who care about style who aren't familiar with it and wouldn't be happy to have to learn/adopt it (this was the main surprise for me)
  • 5 years from now, this naming may have spread to ~everyone, may remain partially-adopted, or possibly even die out as a fad
  • for people familiar with the terminology: "ConstStyle: East" is clearer than "ConstStyle: Right" (less ambiguous)
  • for people unfamiliar with the terminology, the opposite is certainly true
  • asymmetry 1: it's probably harder to work out east=right than to to work out that "ConstStyle: right" means const is written on the right of the type
  • asymmetry 2: the new terminology is objectively better: terse, memorable, doesn't collide with other terms. Some dislike it, which is true of any distinctive name (I hate "namespace", for instance).
  • there's clearly a cultural tension between reading like a meme-infested subreddit and like an IBM technical manual :-)

It's a tough call, but I'd propose a compromise: make Left/Right canonical as it's more accessible (don't have to learn new things) and in case East/West dies out.
But to have East/West as aliases: to let the community decide over time, and because I don't think we should be in the business of hindering adoption of better names.

("reluctantly" changed my mind because I do think east/west are better names. But meeting users where they are is important too - otherwise we'd just hardcode the One True Style :-))

aaron.ballman added inline comments.May 26 2020, 6:30 AM
clang/lib/Format/EastWestConstFixer.cpp
139 ↗(On Diff #266046)

Do you need to look for restrict here as well as volatile?

150 ↗(On Diff #266046)

Comments should start with a capital letter and end with punctuation per the coding standard (same applies to other comments in the patch).

157–158 ↗(On Diff #266046)

What about something like const unsigned long long or the less-common-but-equally-valid long const long unsigned? Or multiple qualifiers like unsigned volatile long const long * restrict

clang/lib/Format/Format.cpp
2445

clang-format's isCpp() covers C and C++ (and ObjectiveC and ObjectiveC++)

Wow, that's a really poorly named function then! Thank you for the clarification.

@MyDeveloperDay Thanks for the update. I pinged you on slack about this, but I guess you're not using it at the moment. I asked if you have a git branch somewhere with this change. Downloading patches from phab is such a pain I have no idea why we use it.

If you can link me to a branch somehow, I can re-test this.

Regarding

#if 0
#else
#endif

blocks causing multiple re-parses, presumably this is because clang-format formats code in the "other" preprocessor branch? At least I think it reformats comments in that case. Maybe the problem can be worked around with that in mind.

clang/docs/ClangFormatStyleOptions.rst
1378

I'd be fine with only having left/right; my personal feeling is also that west-const / east-const has kinda become a term of art, though, so I really don't know which one is "right" :)

This reminds me of the joke that Americans drive on the "Right" side of the road, and English drive on the "Correct" side. Sort of gives a different meaning to ConstStyle : Right and that the alternative is Wrong :). Maybe that language ambiguity is why East/West caught on.

people do not write new configs very often

Agreed. It seems a small number of strong views might influence this enough to make East/West not be used. What a pity.

MyDeveloperDay marked an inline comment as done.May 26 2020, 12:42 PM
MyDeveloperDay added inline comments.
MyDeveloperDay added a comment.EditedMay 26 2020, 1:06 PM

I really do appreciate the reviews and the comments especially regarding east/west/left/right/wrong ;-), I know there won't be 100% agreement, but I think most people who even considered cloning the LLVM repo know what we mean by East/West. As such I'm going to leave the internal code that way for now. (and the name of the TokenAnalyzer Files)

However, I have added support for Left/Right and East/West in the config, but I'm going to refrain from adding Before/After otherwise it just gets silly having too many options.

Whilst I've been developing this I've tried both ways, to be honest, I instinctively know what is meant by East Const because its the style I don't use myself (don't shoot me for that comment). But when talking in terms of Left/Right I feel I have to think about what it means quite hard. Especially as Right feels Wrong to me too!

Let me reiterate my goal. For me, the goal was to make east/west const conversations disappear in the same way that tab and whitespace conversations have somewhat disappeared, because I think these conversations are a waste of good conference time.

The answer should just be "use clang-format", and for me, this is part of my own feeling that we should "clang-format all the things". I feel the best compromise is to offer both configurations, (and I'll likely update the documentation to not have so much of a bias)

If we can agree to that, then I'll be happy in a year or so to do a poll of the .clang-format files in github.com and then change the internal code to match the majority, to me that would be the greatest measure of which should be the primary option.

Apart from that, I've still some work to do here, this case from @steveire is really stumping me. Every preprocessor branch causes a different iteration over the same original code and as such this is causing multiple replacements to be added as each pass reanalyses the original code and doesn't regenerate the code in between (super odd)

verifyFormat("Foo<const Foo<int>> P;\n#if 0\n#else\n#endif", "Foo<Foo<int> const> P;\n#if 0\n#else\n#endif", Style);

I tried to only perform the replacement on the first pass, but alas that means if I put any declarations inside the preprocess clauses they actually don't get converted. (I'm not sure if anyone has seen this before).

In the background, I've been testing this on LLVM itself (which isn't easy because the code isn't all clang-formatted itself, another pet peeve of mine), apart from the #if/#else issue (which hits lib/Analyzer/CFG.cpp and a few other files) it seems to work pretty well (although I've not done a detailed walkthrough to see if it's missing much)

I would like to land this at some point, but cannot whilst I still have the #if/#else issue

Thanks for the help and the feedback, just wanted to give everyone a progress report.

if I put any declarations inside the preprocess clauses they actually don't get converted.

Sorry, I'm not certain what this means. Does it mean that if you have

#if 0
Foo<const Foo<int>> P;
#else
Foo<const Foo<int>> P;
#endif

that neither of them get converted?

Can you point me to a git branch I can use to try this out? The last patch I tried to download from phab didn't apply cleanly. If you have a git branch I can rebase it with more confidence.

I really do appreciate the reviews and the comments especially regarding east/west/left/right/wrong ;-), I know there won't be 100% agreement, but I think most people who even considered cloning the LLVM repo know what we mean by East/West. As such I'm going to leave the internal code that way for now. (and the name of the TokenAnalyzer Files)

I think that's totally reasonable. People working on dev tools are more likely to be aware of newer terminology anyway.

However, I have added support for Left/Right and East/West in the config, but I'm going to refrain from adding Before/After otherwise it just gets silly having too many options.

Sounds good to me!

Whilst I've been developing this I've tried both ways, to be honest, I instinctively know what is meant by East Const because its the style I don't use myself (don't shoot me for that comment). But when talking in terms of Left/Right I feel I have to think about what it means quite hard. Especially as Right feels Wrong to me too!

Would it help if we named the option ConstPlacement (or something along those lines) instead of ConstStyle as a reminder that this is about the placement of the qualifier relative to the base type? Or we could keep ConstStyle and go with OnRight|OnLeft (in addition to East|West) if that reads more clearly.

Let me reiterate my goal. For me the goal was to make east/west const conversations disappear in the same way that tab and whitespace conversations have disappeared (mostly) because I think those conversations are a waste of good conference time.

I think it's a great goal and I'm really looking forward to having this option in clang-format -- thank you for working on this feature!

steveire added inline comments.May 26 2020, 2:12 PM
clang/lib/Format/EastWestConstFixer.cpp
22 ↗(On Diff #266046)

Should this be removed?

MyDeveloperDay marked an inline comment as done.

Fix issue when preprocessor #if/#else is present
Rename the config file name to ConstPlacement
change the command line option to be --const-placement
Add Left/Right into the documentation

rebase with master

MyDeveloperDay marked 8 inline comments as done.May 26 2020, 3:46 PM
MyDeveloperDay added inline comments.
clang/lib/Format/EastWestConstFixer.cpp
139 ↗(On Diff #266046)

I think restrict only occurs the other side of the * am I right?

157–158 ↗(On Diff #266046)

I would like to cover as many cases as possible, but a small part of me thinks that at least initially if we skip a case or two like this I'd be fine.

But I'll take a look and see what I think we could add easily in terms of multiple simple types in a row.

rsmith added a subscriber: rsmith.May 26 2020, 4:07 PM

I'm uncomfortable about clang-format performing this transformation at all. Generally, clang-format only makes changes that are guaranteed to preserve the meaning of the source program, and does not make changes that are only heuristically likely to be semantics-preserving. But this transformation is not behavior-preserving in a variety of cases, such as:

#define INTPTR int *
const INTPTR a;
// INTPTR const a; means something else

I also don't think this is something that a user would *want* in clang-format: changing the location of consts is something that would likely be done rarely (probably only once) when migrating to a different coding style, rather than being done on the fly after each edit to a source file.

Fundamentally, I don't think this transformation is simply reformatting, and I don't think it can be done sufficiently-correctly with only a largely-context-free, heuristic-driven C++ parser. As such, I think this belongs in clang-tidy, not in clang-format.

I'm uncomfortable about clang-format performing this transformation at all. Generally, clang-format only makes changes that are guaranteed to preserve the meaning of the source program, and does not make changes that are only heuristically likely to be semantics-preserving. But this transformation is not behavior-preserving in a variety of cases, such as:

#define INTPTR int *
const INTPTR a;
// INTPTR const a; means something else

At least in my case, our codebase doesn't have code like that. I wonder how common it is.

I also don't think this is something that a user would *want* in clang-format: changing the location of consts is something that would likely be done rarely (probably only once) when migrating to a different coding style, rather than being done on the fly after each edit to a source file.

I don't think this is true.

All of the arguments in favor of clang-format existing apply here.

  • Developers get it wrong. They put the { or the * in the "wrong" place according to the style guide, and they put the const in the "wrong" place according to the style guide.
  • clang-format is much faster than clang-tidy and it is reasonable to have text editors run it on every "Save" and on all files in a repo on every git commit etc.
  • The above means that the cost of developers getting it wrong is minimized or eliminated
  • The above means that developers don't have to pay attention to * placement or const placement while writing code, but can (in theory at least) focus on what they're trying to convey.
  • It seems that more tools understand clang-format than clang-tidy (eg text editors with support for it)

Fundamentally, I don't think this transformation is simply reformatting, and I don't think it can be done sufficiently-correctly with only a largely-context-free, heuristic-driven C++ parser.

I agree that this is a less simple transformation than existing clang-format functionality.

I can't evaluate whether your macro example (or other examples you could come up with) mean that it can't be done sufficiently-correctly, but I doubt I would come to the same conclusion. I would probably base that on whether I could find real-world code which it breaks, and how common the breaking-patterns (the macro in your example) actually are in other code.

As such, I think this belongs in clang-tidy, not in clang-format.

Given the speed difference and the developer convenience, I think this would be very unfortunate. I've already tried to write this conversion as a clang-tidy check. However, my implementation has far more bugs than this clang-format implementation, and it does not handle as many cases. You can see that many clang-tidy checks exclude types of code such as "all templates" from transformation to dampen the complexity of the check implementation.

Besides, a clang-tidy implementation would run into the same problem with your macro example, wouldn't it? Often the solution to that kind of problem in clang-tidy checks is to simply not transform code in macros. Would an option in clang-format to not transform around macro code be an more acceptable solution?

MyDeveloperDay marked an inline comment as done.May 27 2020, 1:02 AM

@rsmith, firstly let me thank you for taking the time to review this, I didn't realize i'd invoke such a reaction that the big guns would start showing up.. fundamentally I agree with you, but let me defend my thesis to help explain my reasons why I still feel this has value.

clang-format is in my view no longer just a code formatter used for transforming whitespace, This changed when we added the sorting of headers and the adding of namespace comments, but fundamentally clang-format has always worked on what is basically the tooling::Replacements interface, extending this to alter code is a likely natural progression.

Whilst I understand your point about clang-tidy, alas I agree with @steveire its not a viable solution in my view and I don't think it would catch the case you suggest either. Its also somewhat slower and needs so much more information, In very large million line projects its akin to a nightly build and not a sub second sanity check. (In my view clang-tidy is not the right tool for this job).

I do understand they will be failure scenarios, (and maybe there is something we can do about those), but...

Clang-format has always been able to break your code especially in extraneous circumstances where people use macros (the __LINE__ example here made me smile!)
https://stackoverflow.com/questions/37927553/can-clang-format-break-my-code

And a recent well publicized issue highlighted that even if we change something that seems semantically the same, it may not be! But that isn't a reason to not use clang-format in my view, its just a reason to be more careful.
https://travisdowns.github.io/blog/2019/11/19/toupper.html

But for me this change is NOT about those corner cases because clang-format is not bug free, it does make mistakes, as such it WILL break code, and even if it doesn't break it semantically, it will from time to time it makes it look like it threw up on your editor (or its drunk https://twitter.com/Manu343726/status/1124686897819934720) , but we have a workaround for those cases, //clang-format off for anything we can't fix and http://bugs.llvm.org for those we can.

You are correct in that this change is most useful during the conversion of a project, and as such its value might initially seem limited, this is why its an opt in option, Leave/East/West with the Default always being to "Leave as is". However anyone bringing in a new clang-format option or clang-format completely always needs to be careful when reviewing the changes it suggests, there will be corner cases (especially around macros) as there are with other clang-format options which can break the code.

My expectation is that its most useful usage is with the command line argument --const-placement=west/east being added to clang-format being run in dry-run mode as part of premerge type checking and used to reject code review which fails that.. I highlighted this once before in this trail run (in polly), where code has slipped in as east const into LLVM and missed during review.

clang-format -n --const-placement=west *.cpp

ScopBuilder.cpp:74:9: warning: code should be clang-formatted [-Wclang-format-violations]
static int const MaxDimensionsInAccessRange = 9;
        ^
ScopInfo.cpp:113:2: warning: code should be clang-formatted [-Wclang-format-violations]
int const polly::MaxDisjunctsInDomain = 20;
....
 ^

But for someone like me managing a multi million line C++ code base with developers in the 4 corners of the globe with a constant turnover of new staff, this is the ultimate value.. I no longer have to persuade some new senior engineer who thinks they have been hired for their coding prowess who insists their bracketing and coding style is the most beautiful in the world when the rest of us know its butt ugly.! Quite simply they cannot get their code committed unless it conforms to our style guide, not because I say so, but because the faceless tool of clang-format says "No!", there is value right there in reducing the arguments and stress alone.

LLVM's own pre-merge checking is reducing our need to keep tell people to clang-format in code review, and this change is about building that ability to reject code before it gets committed, to reduce the code review burden.

And Finally this change is about trying to banish the inane conversations about what "const" is "best const" in the same way as we have somewhat done with white-space and tabs. I just don't think we should waste our time talking about it, just clang-format it and "you do you", this is what I want out of this because I'm fed of up seeing brilliant minds (http://slashslash.info/2018/02/a-foolish-consistency/) argue about it when a 99.9% solution is a couple of 100 line patch.

This is the defense of my work and why I and I believe others think there is value here.

Fix crash whilst rechecking polly

../polly/lib/Analysis/ScopBuilder.cpp:74:8: warning: code should be clang-format
ted [-Wclang-format-violations]
static int const MaxDimensionsInAccessRange = 9;
       ^
../polly/lib/Analysis/ScopInfo.cpp:114:1: warning: code should be clang-formatte
d [-Wclang-format-violations]
int const polly::MaxDisjunctsInDomain = 20;
^
../polly/lib/Analysis/ScopInfo.cpp:119:8: warning: code should be clang-formatte
d [-Wclang-format-violations]
static int const MaxDisjunctsInContext = 4;
       ^
../polly/lib/Analysis/ScopInfo.cpp:2686:3: warning: code should be clang-formatt
ed [-Wclang-format-violations]
  auto const &DL = F->getParent()->getDataLayout();
  ^
../polly/lib/Analysis/ScopInfo.cpp:2822:3: warning: code should be clang-formatt
ed [-Wclang-format-violations]
  auto const &DL = F.getParent()->getDataLayout();
  ^
../../build_ninja/bin/clang-format -n ../polly/lib/CodeGen/*.cpp
../../build_ninja/bin/clang-format -n ../polly/lib/Plugin/*.cpp
../../build_ninja/bin/clang-format -n ../polly/lib/Transform/*.cpp
../polly/lib/Transform/Simplify.cpp:39:8: warning: code should be clang-formatte
d [-Wclang-format-violations]
static int const SimplifyMaxDisjuncts = 4;
       ^
../../build_ninja/bin/clang-format -n ../polly/lib/Support/*.cpp
../polly/lib/Support/SCEVAffinator.cpp:35:8: warning: code should be clang-forma
tted [-Wclang-format-violations]
static int const MaxDisjunctionsInPwAff = 100;
       ^
../polly/lib/Support/SCEVAffinator.cpp:39:8: warning: code should be clang-forma
tted [-Wclang-format-violations]
static unsigned const MaxSmallBitWidth = 7;
       ^

@MyDeveloperDay +1 from the trenches - I am in that same position and it took a lot of work to get the organization to _align_ on a consistent style, then enforce that.

I'm uncomfortable about clang-format performing this transformation at all. Generally, clang-format only makes changes that are guaranteed to preserve the meaning of the source program, and does not make changes that are only heuristically likely to be semantics-preserving. But this transformation is not behavior-preserving in a variety of cases, such as:

#define INTPTR int *
const INTPTR a;
// INTPTR const a; means something else

That's an excellent point. I agree that the formatting cannot change the meaning of the code that's being formatted -- that would be bad.

I also don't think this is something that a user would *want* in clang-format: changing the location of consts is something that would likely be done rarely (probably only once) when migrating to a different coding style, rather than being done on the fly after each edit to a source file.

I'm not certain I agree with this. For instance, you can run clang-format as a precommit step or as part of your CI pipeline to bark at users when they get formatting incorrect while working on a team project. (We do this internally with some of our internal projects.)

Fundamentally, I don't think this transformation is simply reformatting, and I don't think it can be done sufficiently-correctly with only a largely-context-free, heuristic-driven C++ parser. As such, I think this belongs in clang-tidy, not in clang-format.

I think clang-tidy has the machinery needed to do this properly, but I think clang-format is logically where I would go to look for the functionality (certainly before thinking of clang-tidy) because this is more related to formatting than not. That said, if we cannot make it work reliably within clang-format, I'd rather see it in clang-tidy than nowhere.

I also don't think this is something that a user would *want* in clang-format: changing the location of consts is something that would likely be done rarely (probably only once) when migrating to a different coding style, rather than being done on the fly after each edit to a source file.

I'm not certain I agree with this. For instance, you can run clang-format as a precommit step or as part of your CI pipeline to bark at users when they get formatting incorrect while working on a team project. (We do this internally with some of our internal projects.)

You can do the same with clang-tidy checks. If you're running clang-format but not clang-tidy as part of your CI, it would likely be worth your while to add clang-tidy to your set of CI checks regardless of what we do here. But if you want this only as a CI check, and not for reformatting code as you edit it, then for me that is a fairly strong signal that this does not belong in clang-format.

Fundamentally, I don't think this transformation is simply reformatting, and I don't think it can be done sufficiently-correctly with only a largely-context-free, heuristic-driven C++ parser. As such, I think this belongs in clang-tidy, not in clang-format.

I think clang-tidy has the machinery needed to do this properly, but I think clang-format is logically where I would go to look for the functionality (certainly before thinking of clang-tidy) because this is more related to formatting than not. That said, if we cannot make it work reliably within clang-format, I'd rather see it in clang-tidy than nowhere.

Right now we have a relatively clear line between the tools. clang-format does not parse or really understand the code, and just heuristically puts the whitespace and line breaks in the right place, in a way that is ~always correct. clang-tidy understands the meaning of the program and can suggest changes that are likely correct (but should typically be double-checked by a person). I think this kind of change is in the latter category.

I totally agree that it's reasonable to think of this as a reformatting change, just as I think it's reasonable to think of (say) reordering the data members of a class to the start as a reformatting change, or to think of moving an inline function definition out of the class definition as a reformatting change, or parenthesizing certain operators as a reformatting change -- and all of those could also reasonably be required by some house coding style. But I don't think clang-format is the right tool to perform those operations.

Begin to cover the cases raised as potential code-breaking changes by ignoring potential MACRO usage
add support for more complex const unsigned long long cases (I believe I can simplify this to a more general pattern)

@rsmith, Thank you for your comments, I don't agree, but thank you anyway.

I will continue to feel there is some value here, My hope is that others will feel the same.

@rsmith, Thank you for your comments, I don't agree, but thank you anyway.

I will continue to feel there is some value here, My hope is that others will feel the same.

To be clear: I do think a tool that will reorder specifiers to match a coding standard has value. My concern is that it seems like scope creep for clang-format in particular to do that, and that scope creep makes me uncomfortable -- we're moving further away from the set of transformations that clang-format can be very confident don't change the meaning of the program, and we have seen problems with such scope creep in the past. But I'm only uncomfortable, not opposed. Looking back through the review thread, I don't think I'm raising anything that @sammccall didn't already bring up, and it seems to me like you have consensus for doing this despite those concerns. So be it. :)

I think that if we are reordering const, we should be reordering all decl-specifiers -- I'd like to see int static constexpr unsigned const long inline reordered to something like static constexpr inline const unsigned long int too. Applying this only to const seems incomplete to me.

clang/lib/Format/EastWestConstFixer.cpp
199 ↗(On Diff #266647)

(Typo: youself -> yourself)

291 ↗(On Diff #266647)

(Typo: longl -> long)

293 ↗(On Diff #266647)

There can be more than four type-specifiers / cv-qualifiers in a row. Eg: unsigned long long int volatile const -> const volatile unsigned long long int.

clang/lib/Format/EastWestConstFixer.h
1 ↗(On Diff #266647)

(Typo in file name.)

curdeius added inline comments.May 27 2020, 4:31 PM
clang/docs/ReleaseNotes.rst
343

It sounds strange as if you wanted to write "[in order] to auto-arrange".

clang/lib/Format/EastWestConstFixer.cpp
83 ↗(On Diff #266647)

Nit: NewType may be misleading when reading. Why not NewText as above?

130 ↗(On Diff #266647)

These functions seem a bit ugly... and very specific. And they both look like rotate left/right. Couldn't it be a single function taking a range/span/collection of FormatTokens?

136 ↗(On Diff #266647)

Nit: West doesn't seem appropriate as a name at the level of this function as you rather don't move elements west/east but left/right. Of course, that applies only if you share my view that it's sort of rotate algorithm.

171 ↗(On Diff #266647)

Parentheses seem to be unnecessary.

171 ↗(On Diff #266647)

Stupid question, are both const and restrict handled here?

175 ↗(On Diff #266647)

Typo: its -> it's.
Missing comma before "it could".

176 ↗(On Diff #266647)

Missing dot at the end.

195 ↗(On Diff #266647)

Why? What about unsigned const int?

208 ↗(On Diff #266647)

I think that this code asks for a cleanup. And if we needed to look for five tokens...?

213 ↗(On Diff #266647)

Shouldn't it be else if?

223 ↗(On Diff #266647)

Nits: double space, missing ending dot.

225 ↗(On Diff #266647)

Nit: unnecessary blank line.

228 ↗(On Diff #266647)

"from to the end"?

244 ↗(On Diff #266647)

Forgotten remnants?

383 ↗(On Diff #266647)

It's a matter of taste, but this condition could be moved into the while condition above.

387 ↗(On Diff #266647)

You would avoid repetition of this statement if you changed while loop into for loop.

One last thought, how about making the config a bit more future-proof and instead of ConstPlacement accept what was discussed some time ago:

QualifierStyle:
   -   const:     Right

and restrict it to const for the moment?
Maybe renaming to QualifierPlacement or something more appropriate.

MyDeveloperDay planned changes to this revision.May 28 2020, 1:16 AM

Minor change for the simpler review comments before refactoring some of the more involved ones

MyDeveloperDay planned changes to this revision.May 28 2020, 3:20 AM
MyDeveloperDay marked 16 inline comments as done.
MyDeveloperDay marked an inline comment as done.May 28 2020, 3:48 AM
MyDeveloperDay added inline comments.
clang/lib/Format/EastWestConstFixer.cpp
130 ↗(On Diff #266647)

This is something I'd also started to feel the same, now I'm starting to handle longer sets of qualifier, I can replace all these swap functions for one single rotate.

static void rotateTokens(const SourceManager &SourceMgr,
                         tooling::Replacements &Fixes, const FormatToken *First,
                         const FormatToken *Last, bool Left) {
  auto *End = Last;
  auto *Begin = First;
  if (!Left) {
    End = Last->Next;
    Begin = First->Next;
  }

  std::string NewText;
  // If we are rotating to the left we move the Last token to the front.
  if (Left) {
    NewText += Last->TokenText;
    NewText += " ";
  }

  // Then move through the other tokens.
  auto *Tok = Begin;
  while (Tok != End) {
    if (!NewText.empty())
      NewText += " ";

    NewText += Tok->TokenText;
    Tok = Tok->Next;
  }

  // If we are rotating to the right we move the first token to the back.
  if (!Left) {
    NewText += " ";
    NewText += First->TokenText;
  }

  auto Range = CharSourceRange::getCharRange(First->getStartOfNonWhitespace(),
                                             Last->Tok.getEndLoc());
  auto Err = Fixes.add(tooling::Replacement(SourceMgr, Range, NewText));
  if (Err) {
    llvm::errs() << "Error while rearranging const : "
                 << llvm::toString(std::move(Err)) << "\n";
  }
}
MyDeveloperDay marked 3 inline comments as done.May 28 2020, 7:45 AM

One last thought, how about making the config a bit more future-proof and instead of ConstPlacement accept what was discussed some time ago:

QualifierStyle:
   -   const:     Right

and restrict it to const for the moment?
Maybe renaming to QualifierPlacement or something more appropriate.

I'm already feeling there needs to be more fine grained control here in the config... (even if that is a list of MACRO types to ignore) or IgnoreCapitializedIdentifiers, or WarnButDontFix or as we talked about before handling more than just the placement of const, I feel you are correct I'll end up polluting the toplevel config namespace unless I switch to some form of structure in the YAML like we do with BraceWrapping.

clang/lib/Format/EastWestConstFixer.cpp
195 ↗(On Diff #266647)

@curdeius would you help me understand your expectation here?

  • east: unsigned int const
  • west: const unsigned int

?

293 ↗(On Diff #266647)

you have the volatile moving too? if you had the choice would it become:

  • const unsigned long long int volatile
  • const volatile unsigned long long int
  • volatile const unsigned long long int

Any reason why? or is that personal taste? what would be the ideal?

157–158 ↗(On Diff #266046)

@aaron.ballman if you saw

long const long unsigned what would you expect it to become in both east and west const cases?

my assumption is:

  • east : long long unsigned const
  • west: const long long unsigned

Or something else?

same for

unsigned volatile long const long * restrict I assume:

  • east : unsigned volatile long long const * restrict
  • west: const unsigned volatile long long* restrict

Its it would help if I understood the typical expectation in these situations?

That's a rotate!

Remove the multiple swap functions for a single rotateTokens function (Thanks for the inspiration @curdeius)

extract the various combination of 2,3,4,5 qualifier types to a simple begin and end then pass those to the rotate.

Updating the diff prior to a potential config change

MyDeveloperDay planned changes to this revision.May 28 2020, 7:56 AM
MyDeveloperDay marked 2 inline comments as done.
curdeius marked an inline comment as done.May 28 2020, 8:16 AM
curdeius added inline comments.
clang/lib/Format/EastWestConstFixer.cpp
195 ↗(On Diff #266647)

Yes, precisely this. And as for all other cases, I would only move const, nothing else.

293 ↗(On Diff #266647)

Given the size of this revision, it would be probably wiser not to touch anything else than const.

MyDeveloperDay marked an inline comment as done.May 28 2020, 10:41 AM
MyDeveloperDay added inline comments.
clang/lib/Format/EastWestConstFixer.cpp
195 ↗(On Diff #266647)

Ok that will now will work (but I'll add these specific unit tests to prove it)

unsigned const int

will see unsigned const and then swap it to be const unsigned

resulting in

const unsigned int

similar will happen in the "east" const sense too (but again I'll add the tests)

@rsmith, Thank you for your comments, I don't agree, but thank you anyway.

I will continue to feel there is some value here, My hope is that others will feel the same.

To be clear: I do think a tool that will reorder specifiers to match a coding standard has value. My concern is that it seems like scope creep for clang-format in particular to do that, and that scope creep makes me uncomfortable -- we're moving further away from the set of transformations that clang-format can be very confident don't change the meaning of the program, and we have seen problems with such scope creep in the past. But I'm only uncomfortable, not opposed. Looking back through the review thread, I don't think I'm raising anything that @sammccall didn't already bring up, and it seems to me like you have consensus for doing this despite those concerns. So be it. :)

I share the concern about this functionality causing a reformat to change the behavior of the program. I think a condition of accepting this feature into clang-format is that it not change the meaning of the user's code (except perhaps in very rare circumstances where there's consensus we should not care about that kind of input source code). If that means we can't have the feature in clang-format and need it in clang-tidy, so be it.

I think that if we are reordering const, we should be reordering all decl-specifiers -- I'd like to see int static constexpr unsigned const long inline reordered to something like static constexpr inline const unsigned long int too. Applying this only to const seems incomplete to me.

Agreed; I brought that up earlier in the review and still think it's valuable.

clang/lib/Format/EastWestConstFixer.cpp
293 ↗(On Diff #266647)

Any reason why? or is that personal taste? what would be the ideal?

It's a bit odd to only handle one qualifier and it's been my experience that people usually want their qualifiers grouped together. While I'm sure there are opinions on the ordering *within* the qualifiers where there are multiple different ones, I would guess that there's not a lot of people who would argue that they want their const qualifiers on the left and their volatile qualifiers on the right.

Btw, don't forget that attributes also need to be handled properly. e.g.,

long static constexpr unsigned __attribute__((address_space(0))) long const int i = 12;

Also, we probably need a test case where the qualifiers have been duplicated, like in this lovely const fort:

const const const
const  int  const
const const const i = 12;

Do we want that to produce const const const const const const const const int i = 12; or const int i = 12;?

157–158 ↗(On Diff #266046)

I would assume the same thing you're assuming in these cases. Similar for:

long static constexpr unsigned long const int i = 12;

// East
static constexpr unsigned long long int const i = 12;
// West
static constexpr const unsigned long long int i = 12;

Uncertain how others feel, esp about the non-qualifier behavior, which I envision being: keep the non-qualifiers in whatever order they appear in the original source, shift the qualifiers left/right as needed (keeping the order in which they appear, if we decide to support all qualifiers instead of just const as part of this patch).

sammccall added a comment.EditedMay 29 2020, 12:59 AM

Random thoughts from the peanut gallery:

  • clang-format *is* the place I'd expect/want this feature, as a user. It's way more similar to int *x -> int* x than it is to e.g. typical clang-tidy rewrites. My only concern is whether we can give it the safety users expect.
  • clang-tidy has a much higher barrier to entry: slow runtime, complex configuration, build-system integration, more false-positives in common scenarios. e.g. Google has a pretty sophisticated CI system that integrates both: clang-format is typically blocking and clang-tidy is typically not. It would be less useful in clang-tidy.
  • AIUI there are two identified sources of unsafety. They're real costs for sure, we should mitigate them as much as possible.
    • bugs: it seems reasonably likely these can be identified and fixed, based on what we've seen in this patch. I'd be more worried about maintenance if this was a drive-by contribution, but it's the opposite of that.
    • macros: there is some precedent for clang-format being bad or even unsafe in the presence of unknown macros. We could include a list of "don't touch qualifiers around" macro names in config. @klimek has a general-purpose mechanism nearly ready where you can provide actual macro definitions in config. It's also unclear if there's a common pattern where we'd see this.
    • typedefs don't introduce any such issues, right?
  • one general mitigation is not including this in any default styles. We could go further and not support setting it in a style file (only as a flag) for clang-format 11, to shake out bugs.
  • I think const is by far the most important qualifier to handle: volatile is rare, others cause less semantic confusion and are generally less controversial. IMO it's fine for this patch to only handle const as long as we know how the configuration could be expanded in future. Shipping is a feature.
  • To bikeshed the configuration once more: what about QualifierOrder: [Const, Type]? This seems fairly self-explanatory, sidesteps "left" vs "west", and expands naturally to [Static, Const, Type] in future. It requires some nontrivial validation though.

Here's some more failing testcases.

class Aa;
class A;
struct timespec;

// Crash
// #define UL unsigned long

// Transformed, but with error reported:
bool foo(Aa const &);

// Not transformed (uppercase)
template <typename T> bool bar(T const &);
template <typename TYPE> bool bat(TYPE const &);
bool bing(A const &);

// const inserted after struct. Does not compile
void fot(struct timespec const t);

// Not transformed
template <typename Type> void tov(typename Type::SubType const tu);

// const inserted after typename. Does not compile
template <typename Type> void tor(typename Type::SubType const &tu);
template <typename Type> void top(typename Type::SubType const *tu);

// const inserted after `TYPE::` (because uppercase?)
template <typename TYPE> void top(typename TYPE::SubType const *tu);

This time I used

BasedOnStyle: LLVM
PointerAlignment: Left
ConstPlacement: West

It seems to me that the heuristic "if the token is all uppercase assume it's a macro containing a * or &" is not the right heuristic. The mechanism of allowing the user to specify problematic macros in the config seems to make more sense.

@steveire Thanks for the additional test cases, I'm reworking how I handle the Macro case as I realized that I didn't actually even try to change the case @rsmith came up with in the first place. I made a decision previous that I couldn't handle any case without the *,& or && to determine that wasn't the case

Just ignoring upper case isn't a good idea either because of LONG,HRESULT and LRESULT types.

it may take me a couple of days, but I agree to have something akin to FOREACH macros might be the way to go.

@steveire Thanks for the additional test cases, I'm reworking how I handle the Macro case as I realized that I didn't actually even try to change the case @rsmith came up with in the first place. I made a decision previous that I couldn't handle any case without the *,& or && to determine that wasn't the case

Just ignoring upper case isn't a good idea either because of LONG,HRESULT and LRESULT types.

it may take me a couple of days, but I agree to have something akin to FOREACH macros might be the way to go.

I think I've found more bugs in the current implementation, but I think you're working on a totally different implementation now, so there may not be so much value in reporting bugs with the current implementation, right?

connojd added a subscriber: connojd.Sep 3 2020, 8:51 PM
Typz added a subscriber: Typz.Sep 10 2020, 8:02 AM