This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Fix inconsistent identification of operator&
ClosedPublic

Authored by dkt01 on Jan 17 2023, 11:53 AM.

Details

Summary

Token annotator for clang-format incorrectly identifies operator& as a reference type in situations like Boost serialization archives.

Add annotation rules for standalone and chained operator& instances while preserving behavior for reference declarations at class scope. Add tests to validate annotation and formatting behavior.

Diff Detail

Event Timeline

dkt01 created this revision.Jan 17 2023, 11:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2023, 11:53 AM
dkt01 requested review of this revision.Jan 17 2023, 11:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2023, 11:53 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Could this be added to Context instead of a new type?

clang/lib/Format/TokenAnnotator.cpp
1205

How does this happen? The only push I can see is before parseBrace, where a pop is following directly after.

1674
1675

Full stop at the end of your comments.

1937

You don't need to pass that, you can access it directly from determineStarAmpUsage, right?

2385

That const doesn't seem fit with the LLVM style.

2385

I'd prefer you write it out.

2490–2491

What about

struct S {
  auto Mem = C & D;
};

That would be annotated, or is there another rule above with would kick in? If there isn't a test with such a construct already please add one.

2496–2502

That seems better.

2544

Why static?

dkt01 marked 6 inline comments as done.Jan 17 2023, 3:39 PM
dkt01 added inline comments.
clang/lib/Format/TokenAnnotator.cpp
2544

The AnnotatingParser object lifetime is too short to track the scope information I wanted. From my testing, it seems like this object is created once per line. This is why I haven't added the scope information as part of the contexts member.

I would like to make this a non-static member, but I'm not sure which object it should be a member of. Do you have a recommendation?

dkt01 added inline comments.Jan 17 2023, 3:39 PM
clang/lib/Format/TokenAnnotator.cpp
2490–2491

This case was already handled properly by the IsExpression && !Contexts.back().CaretFound condition, but I'll add this test case as there isn't an equivalent test yet.

dkt01 updated this revision to Diff 489962.Jan 17 2023, 3:41 PM
dkt01 added a reviewer: HazardyKnusperkeks.

Changes based on HazardyKnusperkeks' review

dkt01 marked an inline comment as done and an inline comment as not done.Jan 17 2023, 3:48 PM
dkt01 updated this revision to Diff 489967.Jan 17 2023, 4:01 PM

Further changes based on HazardyKnusperkeks' review

dkt01 marked an inline comment as done.Jan 17 2023, 4:09 PM
dkt01 added inline comments.
clang/lib/Format/TokenAnnotator.cpp
1205

I previously had a double pop for some braces, but now the checks are no longer necessary as pushes and pops match.

MyDeveloperDay added inline comments.Jan 18 2023, 6:27 AM
clang/unittests/Format/TokenAnnotatorTest.cpp
181

how does this differentiate from

MyType & val2;

is that a binary operator? I don't think so?

dkt01 marked an inline comment as done.Jan 18 2023, 6:32 AM
dkt01 added inline comments.
clang/unittests/Format/TokenAnnotatorTest.cpp
181

The heuristic I'm using is that [symbol] & [symbol]; outside a class/struct declaration is more likely an instance of operator& than an uninitialized reference. Tests on line 206 and 215 demonstrate how the two cases differ.

clang/lib/Format/TokenAnnotator.cpp
2544

The TokenAnnotator? And then by reference to the parser.

2947

Unrelated.

clang/unittests/Format/TokenAnnotatorTest.cpp
181

The heuristic I'm using is that [symbol] & [symbol]; outside a class/struct declaration is more likely an instance of operator& than an uninitialized reference. Tests on line 206 and 215 demonstrate how the two cases differ.

I'm with this one. A reference without initialization is an error.

Can you add a test with an assignment, I didn't see one right now.

dkt01 marked an inline comment as done.Jan 20 2023, 8:03 AM
dkt01 added inline comments.
clang/lib/Format/TokenAnnotator.cpp
2544

I was over-thinking it... Adding to TokenAnnotator works great!
I did have to re-introduce the size checks before popping scopes because otherwise the UnbalancedStructuralElements test fails.

clang/unittests/Format/TokenAnnotatorTest.cpp
181

Added two new tests for assignment to pointer using unary operator & and initializing assignment to reference because neither of these assignments was in existing format tests I could find.

dkt01 updated this revision to Diff 490859.Jan 20 2023, 8:03 AM

Changes as suggested by HazardyKnusperkeks in review.

clang/lib/Format/TokenAnnotator.cpp
850

See below

851

Is this check also necessary? Here we should only be when we have pushed before.
I'd change the if for an assert.

1205

Move the comment in the if? In general it hasn't to be a unbalanced brace, right?

dkt01 updated this revision to Diff 491823.Jan 24 2023, 8:35 AM
dkt01 marked 3 inline comments as done.

Changes recommended by HazardyKnusperkeks in review.

HazardyKnusperkeks added a reviewer: rymiel.

Looks good to me, but please wait until another one has given their blessing, or at least no veto for a few days.

This revision is now accepted and ready to land.Jan 24 2023, 12:55 PM

There is part of me that feels we should not rush this in to get it into 16 (which was due to branch today), I feel it might be a good idea to commit this after branching, and let it live in 'main' so we can take for a real test drive

dkt01 added a comment.Jan 24 2023, 9:07 PM

Sounds good. I don't believe I have permissions to add commits to three repo anyway, so someone can add on my behalf after the 16 branch.

I think they have branched (https://clang.llvm.org/docs/ClangFormatStyleOptions.html) the documentation has changed to 17.0

We would need your name and email in order to commit on your behalf.

I checked this patch out and tested it on my large project and I didn't see anything immediately jump out.

@owenpan, @rymiel any concerns?

dkt01 added a comment.Jan 25 2023, 9:10 PM

David Turner <turner_david_k@cat.com>

Good to know it worked as expected on your project as well.

Good to know it worked as expected on your project as well.

I have some concerns (mainly because it feels like it could be quite invasive), hence I'm delaying on the LGTM, I really want to run this on a large code base, which I think if we commit post the branch (which is now branched) we would probably see more easily. Historically in the past I think the firefox team have kindly picked up recent builds (no obligations) and catch any regressions.

clang/lib/Format/TokenAnnotator.cpp
2766

was there a reason it could no longer be const?

owenpan added inline comments.Jan 26 2023, 2:19 AM
clang/lib/Format/TokenAnnotator.cpp
118–119

Please keep them in the same order as that of the parameters above.

851

Should we assert that the token type of OpeningBrace matches the ScopeType before popping?

1161–1168

Do we really need these? If we add other special l_brace token types, we will have to update this list.

1207

Can you leave the comment at the top of the case? If we get rid of None for ScopeType, we don't need to pop here. (See below.)

2497–2499

Can you add a lambda for this so that it can be used below?

2500–2502

NextToken is guarded against null on line 2488 but not on lines 2489-2490.

2503–2504

Here we can use the lambda suggested above.

2799

Instead of pushing None, would it better to leave Scopes empty? Then we would not need None for the ScopeType, and instead of checking for Scopes.size() > 1, we could check if it's empty().

clang/lib/Format/TokenAnnotator.h
183 ↗(On Diff #491823)

Drop std::.

clang/unittests/Format/TokenAnnotatorTest.cpp
192–194

Can you add a case for periodstar?

207–213

Can you add a case for arrowstar?

dkt01 marked 11 inline comments as done.Jan 26 2023, 9:20 AM
dkt01 added inline comments.
clang/lib/Format/TokenAnnotator.cpp
1161–1168

Agreed. This was left over from a prior implementation that didn't have a default case.

2500–2502

If NextToken is null on 2488, 2489-2490 will be short circuited.

2766

Member Scopes can be modified. Alternate would be to make Scopes mutable, but I thought removing const from this function is cleaner

dkt01 updated this revision to Diff 492482.Jan 26 2023, 9:21 AM
dkt01 marked an inline comment as done.

Changes as suggested in review from owenpan

owenpan added inline comments.Jan 27 2023, 10:07 PM
clang/lib/Format/TokenAnnotator.cpp
115–117

Something like the above. (I prefer Scopes to TrackedScopes to be consistent with the naming of the other parameters.)

1205–1208

I don't think it's about unbalanced braces here.

2495–2496

To help simplify the if statement below.

2500–2502

You are right. The redundant parens threw me off.

2501

Redundant parens.

2502–2503

The lambda would check that PrevToken is nonnull.

2504–2508

Something like that.

clang/lib/Format/TokenAnnotator.h
36 ↗(On Diff #492482)

Consider moving ScopeType out of TokenAnnotator and keeping it consistent with the style of LineType above (and replacing TokenAnnotator::ScopeType with ScopeType everywhere).

184–191 ↗(On Diff #492482)

See above.

dkt01 marked 11 inline comments as done.Jan 29 2023, 9:21 AM
dkt01 added inline comments.
clang/lib/Format/TokenAnnotator.cpp
1205–1208

if (!Scopes.empty()) handles unbalanced braces. if(Tok->Previous) handles the case where a line starts with an rbrace.

2502–2503

!PrevToken->getPreviousNonComment() || is a different check than the null check in the lambda. It's acceptable to have no token two tokens prior because that indicates the ampersand is the second token of the line.

dkt01 updated this revision to Diff 493104.Jan 29 2023, 9:22 AM

Updates suggested in owenpan's review.

owenpan added inline comments.Feb 1 2023, 12:59 PM
clang/lib/Format/TokenAnnotator.cpp
122–124
128
851
1151–1163
1205–1208

I can't think of an example. Do you have one?

2495

Sorry about that!

2502–2503

I actually wanted fold this into the lambda. (See above.)

2502–2503

Now we only need the lambda.

dkt01 marked 8 inline comments as done.Feb 1 2023, 1:40 PM
dkt01 added inline comments.
clang/lib/Format/TokenAnnotator.cpp
1205–1208

The format unit test FormatUnbalancedStructuralElements feeds in strings that have only right braces, so without this check the pop fails.

dkt01 updated this revision to Diff 494059.Feb 1 2023, 1:40 PM

Updates recommended by owenpan in review.

owenpan accepted this revision.Feb 4 2023, 3:59 AM
owenpan added inline comments.
clang/lib/Format/TokenAnnotator.cpp
122

As suggested before.

1205–1208

Thanks!

dkt01 marked 4 inline comments as done.Feb 5 2023, 9:01 AM
dkt01 added inline comments.
clang/lib/Format/TokenAnnotator.cpp
122

Oops... missed the consts last time...

dkt01 updated this revision to Diff 494928.Feb 5 2023, 9:02 AM
dkt01 marked an inline comment as done.

Changes requested in owenpan's review.
Rebased changes onto latest main branch.

This revision was landed with ongoing or failed builds.Feb 5 2023, 1:33 PM
This revision was automatically updated to reflect the committed changes.