This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Fix AlignConsecutive on PP blocks
ClosedPublic

Authored by JakeMerdichAMD on May 4 2020, 7:42 PM.

Details

Summary

Currently the 'AlignConsecutive*' options incorrectly align across
elif and else statements, even if they are very far away and across
unrelated preprocessor macros.

This failed since on preprocessor run 2+, there is not enough context
about the #ifdefs to actually differentiate one block from another,
causing them to align across different blocks or even large sections of
the file.

Eg, with AlignConsecutiveAssignments:

\#if FOO      // Run 1
\#else        // Run 1
int a   = 1;  // Run 2, wrong
\#endif       // Run 1

\#if FOO      // Run 1
\#else        // Run 1
int bar = 1;  // Run 2
\#endif       // Run 1

is read as

int a   = 1;  // Run 2, wrong
int bar = 1;  // Run 2

The approach taken to fix this was to add a new flag to Token that
forces breaking alignment across groups of lines (MustBreakAlignBefore)
in a similar manner to the existing flag that forces a line break
(MustBreakBefore). This flag is set for the first Token after a
preprocessor statement or diff conflict marker.

Possible alternatives might be hashing preprocessor state to detect if
two lines come from the same block (but the way that ifdefs are
sometimes deferred makes that tricky) or showing the preprocessor
statements on all passes instead of just the first one (seems harder).

Fixes #25167,#31281

Diff Detail

Event Timeline

JakeMerdichAMD created this revision.May 4 2020, 7:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2020, 7:42 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
MyDeveloperDay accepted this revision.EditedMay 5 2020, 2:41 AM
MyDeveloperDay added a subscriber: VelocityRa.

I'm struggling with the simplicity of your solution ;-) (that's a compliment!)

I've pulled this patch and built it locally. It certainly had no detrimental effects on the sources I normally check clang-format against, and in some quick tests

#if FOO
int aaaaa  = 12;
int bbbbbb = 12;
#else
int a                           = 12;
#endif

#if BAR
int aaavvvvvaaa = 12; // ABC
#else
int a                           = 12; // ABC
#endif

#if BAZ
int aaavvvvvaaa = 12;
#if FOO
int aaaaaavvvvvaaa = 12;
#endif
#else
int a                           = 12;
#endif

#if A
#else
int aaaaaaaaaaaa                = 12; // ABC
#endif
#if B
#else
int a                           = 12; // ABC
#endif
#if B
#else
int verylongnamethatshouldalign = 12; // ABC
#endif

Looks to be transformed into:

#if FOO
int aaaaa  = 12;
int bbbbbb = 12;
#else
int a = 12;
#endif

#if BAR
int aaavvvvvaaa = 12; // ABC
#else
int a = 12; // ABC
#endif

#if BAZ
int aaavvvvvaaa = 12;
#if FOO
int aaaaaavvvvvaaa = 12;
#endif
#else
int a = 12;
#endif

#if A
#else
int aaaaaaaaaaaa = 12; // ABC
#endif
#if B
#else
int a = 12; // ABC
#endif
#if B
#else
int verylongnamethatshouldalign = 12; // ABC
#endif

Which I think is your point, that the alignment shouldn't breach #endif and it definitely shouldn't inherit the alignment from something further down. (if I've understood correctly)

From my perspective this LGTM, I'm impressed.

You may want to give some others a couple of days. (I'm guessing you don't have commit access correct?)

I'd like to add @VelocityRa & @enyquist as they wrote the AlignConsecutiveMacros work and probably have a greater insight than me into the problems around this area.

This revision is now accepted and ready to land.May 5 2020, 2:41 AM

@MyDeveloperDay, you're right about what this issue addresses: it surprised me a lot when an unrelated edit caused something to 'randomly' add spaces elsewhere, since it's in a different ifdef block (whether or not it has the same condition). The code that's aligned should always be across lines that are visually consecutive, and those ones weren't (in most cases, they aren't even logically consecutive).

You're also right that I don't have commit access; this is my first commit in LLVM.

I'm struggling with the simplicity of your solution ;-) (that's a compliment!)

Thank you!

@MyDeveloperDay, you're right about what this issue addresses: it surprised me a lot when an unrelated edit caused something to 'randomly' add spaces elsewhere, since it's in a different ifdef block (whether or not it has the same condition). The code that's aligned should always be across lines that are visually consecutive, and those ones weren't (in most cases, they aren't even logically consecutive).

You're also right that I don't have commit access; this is my first commit in LLVM.

I'm struggling with the simplicity of your solution ;-) (that's a compliment!)

Thank you!

You are welcome, Given your obvious level of understanding could I encourage you to gain commit access? (its a relative simple process)

The reason I ask is we need people able to review and accept code reviews (this requires commit access), I wonder if you'd be interested in participating in future reviews?

Its always nice to have new people on board.

JakeMerdichAMD added a project: Restricted Project.May 5 2020, 6:49 AM

Sure, I'll get started on that. It mainly comes from charging headfirst into the edge cases, but I think I have a decent grasp of clang-format internals now, and I'm definitely interested in both contributing to and reviewing for it.

Sure, I'll get started on that. It mainly comes from charging headfirst into the edge cases, but I think I have a decent grasp of clang-format internals now, and I'm definitely interested in both contributing to and reviewing for it.

I'm going to start adding you to reviews, its nice to be able to increase the pool of reviewers we can use, I tend to work on the small defects from the bug tracker and I feel bad about over burdening people.

Hey, @MyDeveloperDay, can I get your assistance in committing this? It's probably been long enough for anyone to chime in.

This revision was automatically updated to reflect the committed changes.

Thanks for the commit and review @MyDeveloperDay!

MyDeveloperDay added inline comments.Sep 21 2020, 6:50 AM
clang/lib/Format/UnwrappedLineParser.cpp
2987

If the line ends with a comment and we have align trailing comments turned on then I think this breaks the alignment

MyDeveloperDay added inline comments.Sep 21 2020, 8:15 AM
clang/lib/Format/UnwrappedLineParser.cpp
2987

If I comment out this line, then all the tests pass! and https://bugs.llvm.org/show_bug.cgi?id=47589 is resolved. I think we need to understand why we are putting this here?

void UnwrappedLineParser::pushToken(FormatToken *Tok) {
  Line->Tokens.push_back(UnwrappedLineNode(Tok));
  if (MustBreakBeforeNextToken) {
    Line->Tokens.back().Tok->MustBreakBefore = true;
    // Line->Tokens.back().Tok->MustBreakAlignBefore = true;
    MustBreakBeforeNextToken = false;
  }
}

I'd be very surprised if any of the tests included in this change pass with that line commented.... it's meant so that things like #if and #else properly separate alignment after the first preprocessor run, where the whitespace manager doesn't have the full context of things between it.

This will break with any PP statement though, not just if-elif-else-endif. I'll try moving it to somewhere more specific like the cases in parsePPDirective and put out a fix.

@JakeMerdichAMD @MyDeveloperDay Do you know if the regression has been fixed?
thanks

MyDeveloperDay added a comment.EditedOct 9 2020, 2:30 AM

Not to my knowledge, if we are not going to fix it we should probably revert it for now, we can bring it back later

Not to my knowledge, if we are not going to fix it we should probably revert it for now, we can bring it back later

I'm a strong supporter of reverting it for the moment.

@JakeMerdichAMD as the regression caused by this change is way more significant than the initial bug, I am planning to revert this patch in the next few days (both in master and 11).
Please let me know if you have any objection.

Belated 'sounds good to me'.

Other things have been keeping me busy, sorry for the delayed response on this.