This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Improve require and concept handling
ClosedPublic

Authored by HazardyKnusperkeks on Nov 5 2021, 2:03 PM.

Details

Summary
  • Added an option where to put the requires clauses.
  • Renamed IndentRequires to IndentRequiresClause.
  • Changed BreakBeforeConceptDeclaration from bool to an enum.

Fixes https://llvm.org/PR32165, and https://llvm.org/PR52401.

Diff Detail

Event Timeline

HazardyKnusperkeks requested review of this revision.Nov 5 2021, 2:03 PM
HazardyKnusperkeks created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2021, 2:03 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Or am I mistaken and we want to be able to control the line break?

Also I would break before struct, class, union, but I would have needed to change the test, and I know that we at least need to talk about in in front. :)

template<typename T>
require X<T>
class ...

vs.

template<typename T>
require X<T> class ...
clang/lib/Format/TokenAnnotator.cpp
1418–1424

This one has cost me a lot of time, should we rethink this?

I feel anything you do here would be an improvement. its probably worth looking at recent issues logged around this issue at the same time

https://bugs.llvm.org/show_bug.cgi?id=52401
https://bugs.llvm.org/show_bug.cgi?id=32165

When I added this support its was basically to prevent it from mascaraing concepts, take my implementation with a grain of salt if you think it needs changing please feel free to propose it.

MyDeveloperDay accepted this revision.Nov 8 2021, 4:41 AM
This revision is now accepted and ready to land.Nov 8 2021, 4:41 AM
HazardyKnusperkeks planned changes to this revision.Nov 8 2021, 12:10 PM

I feel anything you do here would be an improvement. its probably worth looking at recent issues logged around this issue at the same time

https://bugs.llvm.org/show_bug.cgi?id=52401
https://bugs.llvm.org/show_bug.cgi?id=32165

When I added this support its was basically to prevent it from mascaraing concepts, take my implementation with a grain of salt if you think it needs changing please feel free to propose it.

I will take a look at the issues and incorporate them. I think I will add a few new options.

HazardyKnusperkeks retitled this revision from [clang-format] Improve require handling to [clang-format] Improve require and concept handling.
HazardyKnusperkeks edited the summary of this revision. (Show Details)

Update on my work in progress. The new test cases are all passed. The old ones are only kept to pick some lines from them.

I would like to hear some feedback while I finish the rest, mostly writing tests.

This revision is now accepted and ready to land.Dec 3 2021, 12:26 PM

Still WIP, not marking as changes planned, so that it pops up on your list and you can give me feedback. :)

clang/lib/Format/Format.cpp
1239

What are your opinions on that?

clang/lib/Format/UnwrappedLineParser.cpp
431–455

Does anyone know for sure what the level is? I have an idea, but I'm not 100% sure yet.

2829

Here I have an requires for a requires clause which did not get a type. Same as requires for requires expressions right now.

Should I tag all of them, even if I do not use the type right now?

2933–2934

This will be removed.

clang/unittests/Format/FormatTest.cpp
23674–23676

Should these lines be indented because requires is indented, or not?

Quuxplusone added inline comments.
clang/docs/ClangFormatStyleOptions.rst
3463

s/delca/decla/g
s/Preceeding/Preceding/g

3503–3522

This option strikes me as actively harmful; I think you should remove it from the PR completely. Nobody does this today and nobody should do it in the future either.

clang/include/clang/Format/Format.h
2524–2526

Comma-splice. But I don't think you need to change this comment at all. It's obvious that "indent" applies only to constructs at the left side of the (logical) source line.

3155–3161

What about for variable templates? What about for alias templates? What about for deduction guides?
It makes sense to me to have one option for this. It doesn't make sense to have 2, 3, or 4. Having 5 feels insane. So, I think you should just have one option that applies consistently to all requires-clauses everywhere in the code.

clang/lib/Format/Format.cpp
1239

+1 RequiresClausePosition=OwnLine, but also IndentRequires=true.

clang/unittests/Format/FormatTest.cpp
23231–23234

Nit: I was initially very confused by the formatting here, until I realized that lines 22289 and 22290 are two lines physically but represent one single line in the test case. Strongly recommend eliminating the gratuitous line break.

23317–23321

This looks wrong. Current:

template <typename T>\n"
concept C = decltype([]() { return std::true_type{}; }())::value &&
            requires(T t) {
  t.bar();
} && sizeof(T) <= 8;

but should be:

template <typename T>\n"
concept C = decltype([]() { return std::true_type{}; }())::value &&
  requires(T t) {
    t.bar();
  } && sizeof(T) <= 8;

(assuming that all the relevant indent-widths are set to 2).

23490

Currently:

template <std::semiregular F, std::semiregular... Args>
requires(std::invocable<F, std::invoke_result_t<Args>...>)
struct constant;

Should be:

template <std::semiregular F, std::semiregular... Args>
requires (std::invocable<F, std::invoke_result_t<Args>...>)
struct constant;

(notice the extra single space). Add a TODO comment and ship it anyway?

23595–23600

Data point: libc++ has a mix of requires (T t) { } and requires(T t) { }. Am I correct that clang-format is not yet smart enough to have a meaningful opinion about this particular whitespace?
I would bet that two years ago I was doing requires(T t) { }, but these days requires (T t) { } is much more natural to me.

23674–23676

Should these lines be indented because requires is indented, or not?

23674–23676

Yes.

template<typename T>
  requires requires (T t) {
    typename T::Bar;
    { t.bar(); } -> std::same_as<bool>;
  }
class Foo {
};

(And in general it would be nice to simplify all of these test cases: lines 22707–22709, 22719–22721, etc. definitely aren't needed, and I even question whether line 22716 is germane to the point of this test case. Ideally, each test case will regression-test a single specific functionality or codepath, with no unnecessary digressions.)

23766

s/tyename/typename/g

23771–23773

This result of RequiresClausePosition=SingleLine, IndentRequires=true feels bizarre to me. I'm 99% sure that RequiresClausePosition=SingleLine should implicitly disable IndentRequires, because IndentRequires applies only to requires-clauses at the beginning of a logical source line, and with SingleLine a requires-clause is never at the beginning of a logical source line.
The fact that this requires keyword accidentally fell at the beginning of a physical source line should be completely irrelevant to the algorithm.

We should test a progression that looks like

template<VeryLongName F> requires
true struct S;
template<VeryLongName123456789 F>
requires true struct S;
template<VeryLongName1234567890 F
> requires true struct S;

(But also, SingleLine is a bad style and it would be cool to just not support it at all. Nobody should use it.)

24227–24229

(1) That return on line 23268 is sneaky. 😛
(2) LLVM style should certainly set IndentRequires=true, so this test shouldn't pass without some modification to line 23273.

HazardyKnusperkeks marked 4 inline comments as done.Dec 3 2021, 4:19 PM
HazardyKnusperkeks added inline comments.
clang/docs/ClangFormatStyleOptions.rst
3503–3522

I will never use it, so I have no strong opinion in that, because it was really easy to add. I will let others decide if we want to offer it.

clang/include/clang/Format/Format.h
3155–3161

That makes sense, in my defense I thought about it and came to the conclusion there would be no need for requires on variable templates, and the others I did not think of.

Why I have chosen to options is maybe someone wants

template <typename T>
requires ...
class ...

But also

template <typename T>
void foo(T) requires ... {

What's your opinion on that?

clang/unittests/Format/FormatTest.cpp
23231–23234

I'm not really happy wit the two solutions which come to my mind:
Using a Style with a lower column limit, or using // clang format off.

23317–23321

For me personally it should look

template <typename T>\n"
concept C = decltype([]() { return std::true_type{}; }())::value &&
            requires(T t) {
              t.bar();
            } && sizeof(T) <= 8;

I have not taken any action to manipulate the position, this is what dropped out of clang-format naturally. And I think the requires should start below the decltype the same as all other conditions would. If the body should be indented is open for discussion.

23490

Take a look at D113369 which will most likely land simultaneously with this change, but currently needs to be updated.

23595–23600

See above.

24227–24229

This test case will be gone, after I'm done. I just keep it to look for some variants I might have missed in writing new tests.

About IndentRequires I have strong opinion, other than changing what is released feels a bit wrong. (Since as far as I can see LLVM Style does not mention requires at all.)

Quuxplusone added inline comments.Dec 4 2021, 7:14 AM
clang/docs/ClangFormatStyleOptions.rst
3503–3522

it was really easy to add

Even easier not to add. ;)

clang/include/clang/Format/Format.h
3155–3161

Well, I see that there is a machine-readable distinction between "requires after template" versus "requires after function parameter list," i.e. between

template<class T> requires X
void f(T);

template<class T>
void f(T) requires X;

The latter position is not (currently) valid for non-functions (e.g. template<class T> int v requires X; is a syntax error).
However, as usual, I don't think that the user should want to format these differently.
If you're using clang-format to format your code, then you should definitely want function-parameter-list-requires-clauses on their own line for sure, to avoid confusing strings of tokens such as

void f() const noexcept requires foobar<T>;  // bad

void f() const noexcept  // better
  requires foobar<T>;

If you do that, then (in my proposed world) you would also automatically get

template<class T>
  requires foobar<T>  // automatically
void f();

template<class T> requires foobar<T>  // can't get this
void f();

template<foobar T>  // could also get this, via manual intervention
void f();

And personally I see nothing wrong with that state of affairs.

clang/unittests/Format/FormatTest.cpp
23231–23234

I guess the higher-level question (which I didn't think of until seeing many other tests) is, redesign this test case to be more targeted and thus shorter. E.g. could you s/std::trait<T>::value/Bar/ or would that destroy the point of the test? (I'm not sure because I'm not sure what the point of the test is.)

23317–23321

OK, if clang-format by default likes to align subexpressions vertically like that, then I agree with your formatting. Either way, though, what it's currently producing is incorrect. :)

23490

Thanks, that resolves this comment! (I left some review comments over there: I agree with that one's two-option approach even though I suggest better names for the two options.)

24227–24229

other than changing what is released feels a bit wrong

IIUC, nothing is released in this area right now. clang-format doesn't indent requires-clauses, sure, but that's because it doesn't understand the keyword. Once clang-format understands C++20, it makes perfect sense to me that we'd teach it to format C++20 in LLVM style, as opposed to formatting C++20 in "a style designed merely to imitate the previous buggy behavior." That'd be like... neoclassical buildings and sculptures being painted white to imitate ancient buildings and sculptures whose paint had worn off. It's not that the ancients liked white (it's not that LLVM 13 liked unindented requires-clauses); it's that their surviving artifacts (clang13-formatted codebases) reflect only a "decayed" version of their actual preferences, because of limitations in their technology.

HazardyKnusperkeks marked 15 inline comments as done.Feb 4 2022, 11:00 AM
HazardyKnusperkeks added inline comments.
clang/include/clang/Format/Format.h
3155–3161

I still can imagine someone maybe wants it different, but for now I go with your suggestion, only one option for all clauses.

Thanks for working on this! I love to see better requires support in clang-format. I didn't do a real review, just some drive-by comments.

clang/docs/ClangFormatStyleOptions.rst
3438

I assume that should clang-format 15 now. Same for other new options.

clang/lib/Format/Format.cpp
1238

Note the libc++ already uses C++20 code. We don't use clang-format officially, but I use it for new code. (Mainly the std::format related code.)

HazardyKnusperkeks marked an inline comment as done.
HazardyKnusperkeks edited the summary of this revision. (Show Details)

This took a lot of work and debugging.
I tried to minimize the refactoring on related code (I've prepared quite a few other commits and stashes...).

I'm confident that we can now format every sane use of requires clauses, expressions, and concept declarations. And even some I wouldn't call sane.

Decided to drop https://llvm.org/PR32166 since in this approach the book keeping to bring the indention back on track after the clause would be too much burden on everyone not using it.

HazardyKnusperkeks requested review of this revision.Feb 4 2022, 11:18 AM
HazardyKnusperkeks marked 3 inline comments as done.Feb 4 2022, 11:44 AM

Format notes updated on local commit.

clang/docs/ClangFormatStyleOptions.rst
3438

Yeah, you hit the small margin between my reply and my upload of the current diff. ;)

clang/lib/Format/Format.cpp
1238

And is there a consensus about formatting requires clauses?

Woow, that's a mountain of work!
All my comments are non-blocking.

clang/docs/ClangFormatStyleOptions.rst
1992
1997
2004–2005
2008
2710–2711
3500
3517

Maybe WithPreceding makes a little more sense without context?

3518–3520

Just a suggestion, I'm not a writer. I'd just like to see something clear and comprehensible.

3534
3535–3536
3551–3552

I'm really not fond of writing that "clauses try to do something" :).

3556–3560

It would be cool to have an example with long template, long parameter list and a long requires clause too.

clang/docs/ReleaseNotes.rst
170
clang/lib/Format/ContinuationIndenter.cpp
484–504
1506–1507

And why it is not in Indent?

clang/lib/Format/TokenAnnotator.cpp
3924–3932

The body of seems to be the exact same code as below in lines 3920-3926.
Maybe you can refactor to something like this?
Name of the lambda in my suggestion is maybe incorrect though.

3943–3951

And then here as well.

clang/lib/Format/UnwrappedLineParser.cpp
439–452

Haha, I guess it can be anything! :)

731

Why plural?

802–804

As you have a comment, put braces around the body of the if.

2156–2159
2769–2770

Doesn't it mean that it always returns? :) Not sure what the value of this comment is.
Maybe it should return a bool to indicate success/failure?

2799

Not a big fan of a double negation, but it rests acceptable here.

clang/unittests/Format/FormatTest.cpp
3829–3833

Do you test what was here previously with RequiresClausePosition: SingleLine (or whichever relevant option)?

19954

Please add a comment that true and false are for backward compatibility. It helps when grepping.

23317–23320

How about adding a test case with the exact same concept body but surrounded with parens (e.g. using decltype(...))?
The one below looks *almost* the same, but uses std::true_type and ::value and so it's harder to transpose to what this test should look like.

23484

Hmm, no space after requires here?
It looks strange.
My preference would be not to put a space before a parameter list (as in requires(T t)), but put a space before of a constraint expression.
It seems to be the style used in https://en.cppreference.com/w/cpp/language/constraints.
Unless there are options that modify this behaviour?

23484

Oh, I've just seen a comment below on basically the same topic.

HazardyKnusperkeks marked 22 inline comments as done.Feb 5 2022, 1:52 AM
HazardyKnusperkeks added inline comments.
clang/docs/ClangFormatStyleOptions.rst
3518–3520

I'm happy to reword the stuff. :)

clang/lib/Format/ContinuationIndenter.cpp
1506–1507

This one I tackled a month ago...
If I remember correctly this is because the indentation is not from an increased `Level of the line, but because the position comes out of getNewLineColumn`.
The Level would only work if I would generate different UnwrappedLines depending on the clause style, which currently I don't.
The `Indent is correctly reduced, but LastSpace` stayed.

clang/lib/Format/TokenAnnotator.cpp
3924–3932

Here is not about indentation, it is about breaking the line. And the difference is `WithFollowing vs. WithPreceding`.

clang/lib/Format/UnwrappedLineParser.cpp
731

Because it applies to all braces found within this block (not nested).

clang/unittests/Format/FormatTest.cpp
3829–3833

I don't understand the question.

Before we had no real handling of requires, which did something like a combination of `SingleLine and WithFollowing`.

I could have left the string in the test unchanged, but would have to set the style to `SingleLine`.

I thought it would be acceptable to change the test, since we basically only now defined a default for LLVMStyle.

23317–23320

Where should the parens go?
The misformatting is that sizeof is indented below bool, not below the l_square.

23484

Yeah, the change for that is D113369 which I will update shortly.

We can gladly discuss the defaults there.

HazardyKnusperkeks marked 2 inline comments as done.
  • Incorporated review notes
  • Rebased
MyDeveloperDay accepted this revision.Feb 5 2022, 3:04 AM

@HazardyKnusperkeks I think this is tremendous, I think this looks and feels like a much more thorough approach to formatting concepts. Thank you so much for taking this on and for adding SO many unit tests that gives us lots of good examples, I'm probably not doing as an in-depth review as others because my own team aren't able to use a compiler on all platforms that allow us to use concepts and so my skill level around them isn't as great because I don't use them day to day. But I'm going to give this a LGTM and when the others are happy I'm more than happy.

My personal opinion is when reviews get this big, and we are making forward progress then I'm ok to "BANK!" what we have. My fear for large reviews is if we get a regression in some other behaviour and we have to say "What changed" and the response is "Everything Changed" then finding out why becomes so much harder. So from my perspective, I'd like to see this landed and any additional issues worked out in separate reviews.

This revision is now accepted and ready to land.Feb 5 2022, 3:04 AM
curdeius accepted this revision.Feb 5 2022, 4:24 AM

I can just say: ship it!

Thanks for all the work!

clang/lib/Format/ContinuationIndenter.cpp
1506–1507

Ok. Roger.

clang/lib/Format/TokenAnnotator.cpp
3924–3932

That's what I meant saying that the name might be incorrect.

clang/unittests/Format/FormatTest.cpp
3829–3833

Fair enough.

23317–23320

Oh, right.

HazardyKnusperkeks marked 7 inline comments as done.
  • Fixed use of a variable which slipped in from another change.
  • Rebased.

Small fix in the test. For this change it is not relevant, but for the next one.

Mordante added inline comments.Feb 8 2022, 8:58 AM
clang/lib/Format/Format.cpp
1238

No there's not. But maybe it's interesting to discuss that after the patch lands to see how the different options look in real code.

It appears that after this patch clang-format started breaking up requires in javascript, e.g.:

// before
function f() {
  var requires = {};
}
// after
function f() {
  var
    requires
  = {};
}

Could we restrict the requires logic only to (obj)C++? (normally I'd go and come up with a patch for cases like this, but this patch is quite large and it's not immediately clear where's the best place for this restriction).

It appears that this causes a regression by adding a space in-between pointer dereferences *p -> * p in some cases, e.g. formatting this with llvm style:

// before
void f() {
  while (p < a && *p == 'a')
    p++;
}
// after
void f() {
  while (p < a && * p == 'a')
    p++;
}

It appears that after this patch clang-format started breaking up requires in javascript, e.g.:

// before
function f() {
  var requires = {};
}
// after
function f() {
  var
    requires
  = {};
}

Could we restrict the requires logic only to (obj)C++? (normally I'd go and come up with a patch for cases like this, but this patch is quite large and it's not immediately clear where's the best place for this restriction).

Addressed in https://reviews.llvm.org/D120324.

It appears that this causes a regression by adding a space in-between pointer dereferences *p -> * p in some cases, e.g. formatting this with llvm style:

// before
void f() {
  while (p < a && *p == 'a')
    p++;
}
// after
void f() {
  while (p < a && * p == 'a')
    p++;
}

* in *p is indeed annotated as a binary operator now:

clang-format-main-20220110-5ff916ab72b26e667bd5d2e4a762650ba479c781--style=file --debug-only=format-token-annotator test.cpp
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=while L=5 PPK=2 FakeLParens= FakeRParens=0 II=0x2f4bf80a6a0 Text='while'
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=23 Name=l_paren L=7 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='('
 M=0 C=1 T=Unknown S=0 F=0 B=0 BK=0 P=59 Name=identifier L=8 PPK=2 FakeLParens=10/5/ FakeRParens=0 II=0x2f4bf806fd8 Text='p'
 M=0 C=0 T=BinaryOperator S=1 F=0 B=0 BK=0 P=50 Name=less L=10 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='<'
 M=0 C=1 T=Unknown S=1 F=0 B=0 BK=0 P=50 Name=identifier L=12 PPK=2 FakeLParens= FakeRParens=1 II=0x2f4bf807008 Text='q'
 M=0 C=0 T=BinaryOperator S=1 F=0 B=0 BK=0 P=45 Name=ampamp L=15 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='&&'
 M=0 C=1 T=UnaryOperator S=1 F=0 B=0 BK=0 P=45 Name=star L=17 PPK=2 FakeLParens=0/ FakeRParens=0 II=0x0 Text='*'
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=100 Name=identifier L=18 PPK=2 FakeLParens= FakeRParens=2 II=0x2f4bf806fd8 Text='p'
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=43 Name=r_paren L=19 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=')'
----
AnnotatedTokens(L=1):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=semi L=1 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=';'
----
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=eof L=0 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=''
----
while (p < q && *p)
  ;

whereas it was (correctly) a unary operator:

clang-format-main-20220222-e0219872--style=file --debug-only=format-token-annotator test.cpp
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=while L=5 PPK=2 FakeLParens= FakeRParens=0 II=0x23639666fa0 Text='while'
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=23 Name=l_paren L=7 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='('
 M=0 C=1 T=Unknown S=0 F=0 B=0 BK=0 P=59 Name=identifier L=8 PPK=2 FakeLParens=10/5/ FakeRParens=0 II=0x236396a30a0 Text='p'
 M=0 C=0 T=BinaryOperator S=1 F=0 B=0 BK=0 P=50 Name=less L=10 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='<'
 M=0 C=1 T=Unknown S=1 F=0 B=0 BK=0 P=50 Name=identifier L=12 PPK=2 FakeLParens= FakeRParens=1 II=0x236396a30d0 Text='q'
 M=0 C=0 T=BinaryOperator S=1 F=0 B=0 BK=0 P=45 Name=ampamp L=15 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='&&'
 M=0 C=1 T=BinaryOperator S=1 F=0 B=0 BK=0 P=45 Name=star L=17 PPK=2 FakeLParens=14/ FakeRParens=0 II=0x0 Text='*'
 M=0 C=1 T=Unknown S=1 F=0 B=0 BK=0 P=54 Name=identifier L=19 PPK=2 FakeLParens= FakeRParens=2 II=0x236396a30a0 Text='p'
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=43 Name=r_paren L=20 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=')'
----
AnnotatedTokens(L=1):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=semi L=1 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=';'
----
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=eof L=0 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=''
----
while (p < q && * p)
  ;

It appears that this causes a regression by adding a space in-between pointer dereferences *p -> * p in some cases, e.g. formatting this with llvm style:

// before
void f() {
  while (p < a && *p == 'a')
    p++;
}
// after
void f() {
  while (p < a && * p == 'a')
    p++;
}

* in *p is indeed annotated as a binary operator now:

clang-format-main-20220110-5ff916ab72b26e667bd5d2e4a762650ba479c781--style=file --debug-only=format-token-annotator test.cpp
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=while L=5 PPK=2 FakeLParens= FakeRParens=0 II=0x2f4bf80a6a0 Text='while'
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=23 Name=l_paren L=7 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='('
 M=0 C=1 T=Unknown S=0 F=0 B=0 BK=0 P=59 Name=identifier L=8 PPK=2 FakeLParens=10/5/ FakeRParens=0 II=0x2f4bf806fd8 Text='p'
 M=0 C=0 T=BinaryOperator S=1 F=0 B=0 BK=0 P=50 Name=less L=10 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='<'
 M=0 C=1 T=Unknown S=1 F=0 B=0 BK=0 P=50 Name=identifier L=12 PPK=2 FakeLParens= FakeRParens=1 II=0x2f4bf807008 Text='q'
 M=0 C=0 T=BinaryOperator S=1 F=0 B=0 BK=0 P=45 Name=ampamp L=15 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='&&'
 M=0 C=1 T=UnaryOperator S=1 F=0 B=0 BK=0 P=45 Name=star L=17 PPK=2 FakeLParens=0/ FakeRParens=0 II=0x0 Text='*'
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=100 Name=identifier L=18 PPK=2 FakeLParens= FakeRParens=2 II=0x2f4bf806fd8 Text='p'
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=43 Name=r_paren L=19 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=')'
----
AnnotatedTokens(L=1):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=semi L=1 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=';'
----
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=eof L=0 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=''
----
while (p < q && *p)
  ;

whereas it was (correctly) a unary operator:

clang-format-main-20220222-e0219872--style=file --debug-only=format-token-annotator test.cpp
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=while L=5 PPK=2 FakeLParens= FakeRParens=0 II=0x23639666fa0 Text='while'
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=23 Name=l_paren L=7 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='('
 M=0 C=1 T=Unknown S=0 F=0 B=0 BK=0 P=59 Name=identifier L=8 PPK=2 FakeLParens=10/5/ FakeRParens=0 II=0x236396a30a0 Text='p'
 M=0 C=0 T=BinaryOperator S=1 F=0 B=0 BK=0 P=50 Name=less L=10 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='<'
 M=0 C=1 T=Unknown S=1 F=0 B=0 BK=0 P=50 Name=identifier L=12 PPK=2 FakeLParens= FakeRParens=1 II=0x236396a30d0 Text='q'
 M=0 C=0 T=BinaryOperator S=1 F=0 B=0 BK=0 P=45 Name=ampamp L=15 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='&&'
 M=0 C=1 T=BinaryOperator S=1 F=0 B=0 BK=0 P=45 Name=star L=17 PPK=2 FakeLParens=14/ FakeRParens=0 II=0x0 Text='*'
 M=0 C=1 T=Unknown S=1 F=0 B=0 BK=0 P=54 Name=identifier L=19 PPK=2 FakeLParens= FakeRParens=2 II=0x236396a30a0 Text='p'
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=43 Name=r_paren L=20 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=')'
----
AnnotatedTokens(L=1):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=semi L=1 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=';'
----
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=eof L=0 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=''
----
while (p < q && * p)
  ;

That means someone guesses this is a binary op and guesses wrong. I try to take a look at it.

That means someone guesses this is a binary op and guesses wrong. I try to take a look at it.

Yeah, I just found out that it is set in determineStarAmpUsage here:

if (PrevToken->Tok.isLiteral() ||
    PrevToken->isOneOf(tok::r_paren, tok::r_square, tok::kw_true,
                       tok::kw_false, tok::r_brace) ||
    NextToken->Tok.isLiteral() ||
    NextToken->isOneOf(tok::kw_true, tok::kw_false) ||
    NextToken->isUnaryOperator() ||
    // If we know we're in a template argument, there are no named
    // declarations. Thus, having an identifier on the right-hand side
    // indicates a binary operator.
    (InTemplateArgument && NextToken->Tok.isAnyIdentifier()))
  return TT_BinaryOperator;

Strangely, InTemplateArgument is true here, but not sure how that happens, as it wasn't something you changed directly.

@HazardyKnusperkeks, forget my previous comment, I think I found what changed.
Before, the * was annotated twice, first as a BinaryOperator, then, on a second run, it was correctly annotated as a UnaryOperator.
In this revision you whitelisted BinaryOperator in resetTokenMetadata, so its type is determined once only.

Yep, confirmed, but I don't know how to fix it.
Just removing TT_BinaryOperator from resetTokenMetadata provokes these failures:

[  FAILED  ] FormatTest.Concepts
[  FAILED  ] FormatTest.RequiresClauses
[  FAILED  ] TokenAnnotatorTest.UnderstandsRequiresClausesAndConcepts

Hopefully you have some idea how to fix it...

Oh, BTW, here's the test for the regression that Krasimir found:

TEST_F(TokenAnnotatorTest, UnderstandsLoops) {
  auto Tokens = annotate("while (p < q && *p == 0) {}");
  EXPECT_EQ(Tokens.size(), 14u) << Tokens;
  EXPECT_TOKEN(Tokens[6], tok::star, TT_UnaryOperator);
}

Yep, confirmed, but I don't know how to fix it.
Just removing TT_BinaryOperator from resetTokenMetadata provokes these failures:

[  FAILED  ] FormatTest.Concepts
[  FAILED  ] FormatTest.RequiresClauses
[  FAILED  ] TokenAnnotatorTest.UnderstandsRequiresClausesAndConcepts

Hopefully you have some idea how to fix it...

Oh, BTW, here's the test for the regression that Krasimir found:

TEST_F(TokenAnnotatorTest, UnderstandsLoops) {
  auto Tokens = annotate("while (p < q && *p == 0) {}");
  EXPECT_EQ(Tokens.size(), 14u) << Tokens;
  EXPECT_TOKEN(Tokens[6], tok::star, TT_UnaryOperator);
}

This is addressed in D120511.