This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Fix some issues in clang-format's AlignConsecutive modes
AcceptedPublic

Authored by bmharper on Jun 13 2016, 1:50 AM.

Details

Reviewers
djasper
berenm
Summary

This patch fixes issues with AlignConsecutiveAssignments and
AlignConsecutiveDeclarations such as this:

Before:
  int fun1(int a);
  double fun2(int b);

  typedef pair<key_type, T> type1;
  typedef pair<X, Y> type2;

After:
  int    fun1(int a);
  double fun2(int b);

  typedef pair<key_type, T> type1;
  typedef pair<X, Y>        type2;

and...

Before:
  void fun(int x = 1) {
      int y      = 2;
  }

After:
  void fun(int x = 1) {
      int y = 2;
  }

The old alignment function was incapable of maintaining alignment whenever
the scope changed. To illustrate - in the first example mentioned, the
alignment of fun1 is lost by entering the nested scope of (int a).
This would cause the alignment to give up, and cause fun2 to start from a
blank slate. It would also cause false alignment, which is illustrated in
the second example above.

My primary motivator for this change is to have a list of function prototypes
line up.

This modification changes the alignment function so that it calls itself
recursively, at each change in scope depth. This allows it to maintain state
across different scope depths. A performance test against the current master
branch reveals a small (2.7%) speedup.

There are some new test cases which stress this functionality. In addition,
there were two historical test cases marked as "FIX ME", which now work as
intended.

In order to sense check, I have run this new implementation against the Clang
source code, with AlignConsecutiveAssignments:true and
AlignConsecutiveDeclarations:true. I then compared the old output with those
settings, vs the new output, with the same settings. All the changes that I
observed are explainable by the new logic, and IMO the formatted code looks
better with the new method.

Diff Detail

Event Timeline

bmharper updated this revision to Diff 60492.Jun 13 2016, 1:50 AM
bmharper retitled this revision from to Fix some issues in clang-format's AlignConsecutive modes.
bmharper updated this object.
bmharper set the repository for this revision to rL LLVM.
bmharper added a project: Restricted Project.
bmharper added a subscriber: Restricted Project.

Generally, please subscribe cfe-commits when sending patches via phab.
See http://llvm.org/docs/Phabricator.html

kfunk added a subscriber: kfunk.Jun 13 2016, 3:51 AM

Generally, please subscribe cfe-commits when sending patches via phab.
See http://llvm.org/docs/Phabricator.html

You should set up a Herald rule then: https://secure.phabricator.com/book/phabricator/article/herald/ -- you can automatically subscribe users based on rules.

djasper edited edge metadata.

Could you upload a diff with full (i.e. the entire file as) context?

As requested - full diff (of all three files)

berenm edited edge metadata.EditedJun 13 2016, 11:00 AM

I think it's better to update the diff with full context (git diff -U999999) rather than attach the file, in order to have it integrated into phabricator diff view. Or use the arcanist command-line tool, which should do it for you.

Except from that, from a quick reading it looks good to me and I'm happy that someone fixed these cases, I see them from time to time actually but never had time to work on a proper fix.

bmharper updated this revision to Diff 60569.Jun 13 2016, 11:35 AM
bmharper edited edge metadata.
bmharper removed rL LLVM as the repository for this revision.

diff with full context

There's still cases where the nesting level is still not correctly handled: when using unbraced conditionals / loops. For example (sorry, silly example):

for (auto index = 0, e = 1000; index < e; ++index)
  int v = 0;
long double value = 1;

is aligned to

for (auto index = 0, e = 1000; index < e; ++index)
  int       v     = 0;
long double value = 1;

I'm not sure how to detect these unbraced scopes, maybe by also looking for different IndentLevel when computing the ScopeLevel?

It looks like it doesn't like the operator[] either:

struct test {
  long long int foo();
  int operator[](int a);
  double bar();

  long long int foo();
  int operator()(int a);
  double bar();
};

becomes:

struct test {
  long long int foo();
  int operator[](int a);
  double bar();

  long long int foo();
  int           operator()(int a);
  double        bar();
};

Interesting. Working on it!

bmharper updated this revision to Diff 60830.EditedJun 15 2016, 6:59 AM
bmharper set the repository for this revision to rL LLVM.

Fix the recent two issues mentioned by Beren, ie the single-statement scopes (for loop without braces), and operator[] alignment.

Thanks for the IndentLevel suggestion - it is the best solution I can see.

berenm added a comment.EditedJun 20 2016, 4:22 AM

Thanks! The operators are now correctly handled.

Another thing I've found is that constructors aren't aligned either with other declarations or together. Do you think it would be possible to treat them as functions as well?

Friend functions also aren't aligned with the other functions, but I'm not sure why or even if they should be. I believe most of the time friend functions are declared in a separate declaration "group" anyway.

struct A {
  explicit A(int);
  A();
  unsigned operator=(int a);
  long     bar(int a);
  friend void     foo();
  friend unsigned baz();
};

I've taken some time to investigate those two issues, and these are my thoughts:

  1. Constructor alignment: I think this is a good thing to do, but if isFunctionDeclarationName, and it's caller TokenAnnotator::calculateFormattingInformation are anything to go by, adding support for detection of constructors is going to be pretty hairy. I think I can see a way to do it, but it involves adding yet more complexity to TokenAnnotator::calculateFormattingInformation, and I'm not sure it's worth the effort. See TokenAnnotator.cpp for reference.
  1. friend functions: I don't really understand why the current behavior is what it is, but I think it's reasonable to argue that it actually improves readability by drawing attention to the fact these are friend functions, which ought to be quite rare in most code
berenm added a comment.EditedJun 21 2016, 2:18 AM
  1. friend functions: I don't really understand why the current behavior is what it is, but I think it's reasonable to argue that it actually improves readability by drawing attention to the fact these are friend functions, which ought to be quite rare in most code

Actually it looks like it works now... I'm not sure what I did when I had this misalignment. It would have been fine by me anyway.

Regarding constructors, your comment seems reasonable to me. The patch already improves the current state, so I think it's good like it is and further improvements could be added later on.

Thanks!

Ping @djasper for his review and eventual merge.

lib/Format/WhitespaceManager.cpp
98

Maybe we could spare the computation if we aren't going to align anything?

Is it better for clarity to always compute additional information? @djasper what's the Clang way to do?

berenm accepted this revision.Jun 21 2016, 2:18 AM
berenm edited edge metadata.
This revision is now accepted and ready to land.Jun 21 2016, 2:18 AM

Hi Daniel,
Is there anything else that I need to do here?

Regards,
Ben

Friendly PING.

Please let me know if there's anything else that I need to do here,
otherwise I'll keep quiet and expect a merge at some point?

Sorry.. Really busy at the moment, but will try to get this reviewed and submitted this week. If not, please ping again!

kfunk edited edge metadata.Jun 28 2016, 7:51 AM
kfunk removed a subscriber: kfunk.

PING

lib/Format/WhitespaceManager.cpp
98

That's a good point. One certainly could elide that if alignment was turned off. I think so long as it was mentioned in the comments of the ScopeLevel member variable, it would be OK to do so. However, I'll also just defer this decision to @djasper.

Sorry :(... Review is easy enough.. Feel free to ping me more often in the future.

lib/Format/WhitespaceManager.cpp
98

Yeah, just avoid unnecessary work.

lib/Format/WhitespaceManager.h
177

NestingLevel does include braces, generally. However, there are two types:

  • Braced initializers: These should just work as is.
  • Braces that open blocks: Here, child lines are created and so the tokens within a block restart from NestingLevel 0. However, taking that NestingLevel in combination with the Level of the AnnotatedLine should work.

I think reusing that is highly preferable over implementing yet another parentheses counting.

bmharper updated this revision to Diff 68675.Aug 19 2016, 5:25 AM
bmharper removed rL LLVM as the repository for this revision.

Fixed the two issues pointed out by djasper in his most recent comments:

  1. Only calculate ScopeLevel when necessary.
  2. Instead of calculating ScopeLevel by inspecting {[(< and their closing pairs, calculate it by combining IndentLevel and Nesting Level. It's not quite as simple as just making ScopeLevel = IndentLevel + NestingLevel, but at least we don't have yet another function computing scope depth from scratch.

I think instead of doing some complex computation with LineLevel and NestingLevel, it might be better to just leave them as the pair and compare them as a pair. The LineLevel should probably always trump the NestingLevel. So, I'd try to just defined ScopeLevel as a pair<int, int>, put both the LineLevel and the NestingLevel in it and use that for the comparisons. I might be overlooking something, though.

The reason one has to precompute ScopeLevel is because IndentLevel is not actually meaningful on each token. It's only meaningful for the first token on the line (the remaining tokens on the line have IndentLevel = 0). So if you look at the implementation of calculateScopeLevel(), you'll see that the function "remembers" the most recent meaningful IndentLevel, and copies that into ScopeLevel for all subsequent tokens on the same line.
Now, one could argue that IndentLevel ought to be the same for all tokens on a line, but that seems to me like a much bigger change, that would propagate into many other parts of the code.

I think the IndentLevel in WhitespaceManager (and the nested Change) is a horrible mess and should be cleaned up. It gets set either to 0 or to the "Level" of the AnnotatedLine. To me only the latter makes sense as the line defines the indent level. Everything else, including recomputing this and introducing a ScopeLevel makes the current situation worse. I can try to do this in advance of this patch, if you prefer.

I'll see what happens if I make IndentLevel the same for all tokens on a line. If not too much is broken by that, I should be able to handle it.

bmharper updated this revision to Diff 69671.Aug 30 2016, 6:38 AM

I have added an initial phase which propagates IndentLevel from the first token on a line, to all of the tokens on that line. This allows us to get rid of the ScopeLevel state. However, I have retained the name "ScopeLevel", and made it a member function of Change. I think this is better than putting IndentLevel and NestingLevel inside an std::pair, because by retaining the words IndentLevel and NestingLevel, it's easy to navigate the rest of the source code and discover where those values come from. Additionally, the function ScopeLevel() needs to execute one tiny, but vital piece of logic, in order to be useful for alignment purposes. One cannot simply add IndentLevel and NestingLevel together. This is explained in detail inside the body of the ScopeLevel() function.

PING!
My previous commit hopefully addressed the issue with the sprawl of IndentLevel + ScopeLevel

djasper added inline comments.Sep 10 2016, 1:46 AM
lib/Format/WhitespaceManager.cpp
50

What I don't understand is why you have to combine NestingLevel and IndentLevel at all. To me it feels wrong to add them no matter what (with and without your extra bit of logic).

IMO, for alignment, we should ensure that both NestingLevel *and* IndentLevel are the same, not just the the sum of the two is the same. That's why I was suggesting putting them into a pair and just comparing the pair. But I might be missing something very obvious.

nikola added a subscriber: nikola.Sep 11 2016, 1:47 AM
nikola added inline comments.
lib/Format/WhitespaceManager.cpp
345

Comment is out of date, it's still talking about scope depth.

So.. I finally got some time to look at this again:

Quick Recap - IndentLevel and NestingLevel are now stored separately inside WhitespaceManager::Change. I've added a function ScopeLevel() which combines them with a bit of logic, and returns a number that can be used for alignment scope detection purposes.
Now the reason why I don't want to combine IndentLevel and NestingLevel into one value:

IndentLevel is used by a function called WhitespaceManager::appendIndentText. I don't understand exactly what this function is doing, and despite some attempts, I haven't managed to craft an input which gets it to run down the code paths I'm interested in. Now as far as I can tell from the experiments I've been able to come up with, it's OK to combine IndentLevel and NestingLevel into a single number, because it has no observable effect on appendIndentText. HOWEVER, just because I can't reproduce the conditions necessary for that code to run, doesn't mean there isn't some body of code out there that does stress those code paths. It seems like a reasonable approach to me to maintain the separate variables IndentLevel and NestingLevel, primarily because those keywords are searchable in the rest of the code, and it's easy to trace their lineage back to the places where they're generated. If we were to combine them, and appendIndentText happens to be broken by this change, then it's going to be very confusing for the next guy to come and figure out why this is so. At present, IndentLevel means what it says, and so does NestingLevel, and I believe, so does ScopeLevel, so I think it's a better idea to keep them separate.

Ben

berenm added a comment.EditedSep 23 2016, 5:45 AM

Hello,

I had a little bit of look into the NestingLevel field. I understand that it only indicates the nesting level of the token inside the current AnnotatedLine, which could very well be the same as the nesting level of another token in the previous or next AnnotatedLine. Right now, it doesn't include the information on the nested level of the AnnotatedLine itself (this is in the Level field of the line itself).

I'm not sure if the value of IndentLevel comes directly from the AnnotatedLine's Level, but I believe that combining them would give an equivalent of a "global" nesting level of the token.

In order to be cleaner, I think it could be done in the AnnotatingParser class, that already fills the NestingLevel field. I wrote a patch that demonstrates this here: https://reviews.llvm.org/D24859. All the unit tests are passing, although I'm not 100% sure about all the implications this change has. With this patch, I believed NestingLevel can be used directly to determine begin and end of alignment sequences, without requiring the ScopeLevel helper function.

Are we talking completely past each other? I specifically think we should *NOT* combine NestingLevel and IndentLevel into one value. Not in ScopeLevel() and not anywhere else.

Are we talking completely past each other? I specifically think we should *NOT* combine NestingLevel and IndentLevel into one value. Not in ScopeLevel() and not anywhere else.

Ok, I probably misunderstood the situation, sorry.

bmharper updated this revision to Diff 72444.Sep 26 2016, 12:52 AM

This revision merges NestingLevel and IndentLevel into an std::pair, as suggested by @djasper.
IMO this makes the code slightly harder to read, because you lose the unique variable names "IndentLevel" and "NestingLevel". Those are now replaced by .first and .second, respectively. In addition, NestingLevel is no longer equal to the original NestingLevel that gets passed in, because it needs to be tweaked slightly to better work with the recursive technique we use for alignment scope.
However, the approach presented here certainly does work, and it's actually not too much of a code change from the previous patch.

kfunk added a subscriber: kfunk.Sep 26 2016, 1:29 AM
kfunk removed a subscriber: kfunk.

So sorry. Seems I forgot to hit "Submit" :(.

If you don't like the ".first" and ".second" of the pair, you could introduce a struct for it and overload operator<. Might actually be more readable.

lib/Format/WhitespaceManager.cpp
71–72

nit: Move these to the previous line. clang-format won't do that, because of the comment, but that's actually irrelevant here.

182

I don't see why this would be necessary. If I remove it, all tests do still pass.

189

Also, this comment seems wrong? The "int x = 1;" actually starts a new (child) line. If that has the same nesting level, that seems like a bug we need to fix.

191

Make this (and maybe a few others) more concrete. Don't write "some special tokens", write what they actually are.

195

If the example isn't too long, writing the source code in the comment seems better than referencing the test.

373

Either do:

EndOfSequence = StoppedAt;

or just remove StoppedAt and use i.

unittests/Format/FormatTest.cpp
8733

Can you add a test case where there is a line wrap after the "("?

bmharper updated this revision to Diff 81213.Dec 13 2016, 5:18 AM

Sorry for the incredibly long turnaround time on this one - I don't have any legitimate excuse, and I know it just makes reviewing it harder.

Anyway - I have sorted out all the issues mentioned in the last review. One thing that you suggested was to go ahead and replace the std::pair for NestingAndIndentLevel, which I did write, but I feel like it adds quite a bit of bloat, because you need two constructors, operator==, operator!=, operator<, operator>, so I'm not convinced that it's worth it in the end. If you still think it is, then I'll go ahead and add it.

Regards,
Ben

berenm added a comment.Jan 9 2017, 1:19 PM

Pinging @djasper

There's https://reviews.llvm.org/D27651 that will conflict with this one.

enyquist added a comment.EditedJan 9 2017, 4:57 PM

Hey bmharper :) I've got a review open that conflicts with this one, just having a look to see what I'll need to refactor
(https://reviews.llvm.org/D28462).

In fact, I have a question-- the conflict is specifically in WhitespaceManager.cpp. Since I needed to detect PP macros containing changes in scope depth (code blocks surrounded by curly braces, macro parameter lists, etc), I was having the same problem as you-- AlignTokens was bailing out whenever the scope depth changed.

In my case, I just added a new parameter to AlignTokens, MaxNestingLevelIncrease, indicating how much the level can increase before we stop alignment, making the allowable scope-depth configurable. For example, calling AlignTokens with this value set to 2 will cause alignment to continue up until we increase scope by two levels.

Now, my question- from what I can tell of your changes, it looks like my code can actually be simpler when this gets merged. The state of AlignTokens will be maintained across changing scope depths, and I won't need to modify AlignTokens so that it can survive something like "#define foo(x) ((x * 2) + 2)". Is this correct?

Hi @enyquist,
I'd like to guess that this patch will solve your problem, but I'm not intimate enough with this code to give you a definitive answer. I hope we can get this merged soon, so that you can just run it and see.

Ben

Well, your patch is here for me to try, and it looks like it's been accepted. So I guess I should just pull my finger out and try it :)
Thanks for your response-- I'll let you know if I come across any issues.

Pinging @djasper. Any chance we can get this merged?

djasper added inline comments.Jan 23 2017, 12:36 AM
lib/Format/WhitespaceManager.cpp
207

Merge the two ifs into a single one?

318

I'd probably move the second condition into an early exit inside the loop:

if (Changes[i].NestingAndIndentLevel >= NestingAndIndentLevel)
  break;
394

Use a comment to describe the literal, i.e.:

/*StartAt=*/0
lib/Format/WhitespaceManager.h
134

This comment seems outdated.

217

This function should be a local function in the .cpp file. Also, can you describe in more detail what this does, i.e. what kind of declarations are covered here? Also, it is a bit confusing to have a function and a member of WhitespaceManager::Change with the exact same name.

I apologize in advance if this causes merge conflicts with r293616. However, I do hope that that actually makes this patch easier.

Thanks - the merge conflicts don't look too bad. I should have it cleaned up by tomorrow.

Thanks - the merge conflicts don't look too bad. I should have it cleaned up by tomorrow.

Have you had a chance to do the merge yet? If not I have a merged version which passes the tests that I could post here if you want.

bmharper updated this revision to Diff 87178.Feb 5 2017, 8:40 PM

Hi @djasper,
I've made the latest bunch of review changes, and rebased onto master. I didn't check the numbers, but I think the code is slightly smaller after the rebase.

Regards,
Ben

bmharper updated this revision to Diff 87179.Feb 5 2017, 8:44 PM

Fixed a stale comment

Thanks @daphnediane. I only read your comment after merging, but that would have been helpful.

djasper added inline comments.Feb 6 2017, 12:00 AM
lib/Format/WhitespaceManager.cpp
164

I don't think you need this struct now. Just use the FormatToken itself, it should have all of this information.

169

This is no longer true. IndentLevel should be set correctly for every token.

263

nit: s/exits/returns/

(or maybe even "returns the current position")

334–338

The "!=" is a bit confusing here. ">" would do the same thing, right (because "<" is already handled above)?

lib/Format/WhitespaceManager.h
149

I think you can get rid of this field now and just directly use the values stored in the tokens.

Thanks @daphnediane. I only read your comment after merging, but that would have been helpful.

If you still want my diff let me know as it is slightly different from yours. No longer has NestingAndIndent level as a data member of Changes ( but has a inline method that gets the values though not 100% sure that is needed as I experimented with a version without it), no longer needs propagateIndentLevels, etc.

bmharper updated this revision to Diff 87361.Feb 6 2017, 8:40 PM

This looks a lot better IMO. Thanks @daphnediane - you should recognize the code ;)
The special closing paren logic inside of nestingAndIndentLevel() is indeed redundant now.

djasper accepted this revision.Feb 7 2017, 3:08 AM

This looks very nice now :-D. Thanks for working on this!!

lib/Format/WhitespaceManager.cpp
189

Maybe add: "It contains indices of the first token on each scope."

bmharper updated this revision to Diff 87478.Feb 7 2017, 10:33 AM

small comment tweak

Thanks for all the code review work! I'm impressed by the quality standard maintained here.

Rebuilt with the latest patch and got one compile error. See line comment worked okay after fixing it.

lib/Format/WhitespaceManager.cpp
207

These ifs can get merged again, when you merged my changes in it was based on a version before you merged them.

lib/Format/WhitespaceManager.h
176

Extra WhitespaceManager::Change:: prefix here

bmharper updated this revision to Diff 89463.Feb 22 2017, 7:49 PM

Fixed two small issues raised by @daphnediane

Hi @djasper,
This is the first patch I've contributed here, so I'm not familiar with the whole process. I assume this code is ready to land? When exactly does it get merged into master, and is there something else that I still need to do to make that happen?

Thanks,
Ben

Commit r298574, thanks for woking on this folks!

MyDeveloperDay retitled this revision from Fix some issues in clang-format's AlignConsecutive modes to [clang-format] Fix some issues in clang-format's AlignConsecutive modes.Jun 24 2020, 8:48 AM
MyDeveloperDay added a project: Restricted Project.