Page MenuHomePhabricator

[clang-format] Improve require and concept handling
ClosedPublic

Authored by HazardyKnusperkeks on Nov 5 2021, 2:03 PM.

Details

Summary
  • Added an option where to put the requires clauses.
  • Renamed IndentRequires to IndentRequiresClause.
  • Changed BreakBeforeConceptDeclaration from bool to an enum.

Fixes https://llvm.org/PR32165, and https://llvm.org/PR52401.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Quuxplusone added inline comments.Dec 4 2021, 7:14 AM
clang/docs/ClangFormatStyleOptions.rst
3464–3483

it was really easy to add

Even easier not to add. ;)

clang/include/clang/Format/Format.h
3130–3136

Well, I see that there is a machine-readable distinction between "requires after template" versus "requires after function parameter list," i.e. between

template<class T> requires X
void f(T);

template<class T>
void f(T) requires X;

The latter position is not (currently) valid for non-functions (e.g. template<class T> int v requires X; is a syntax error).
However, as usual, I don't think that the user should want to format these differently.
If you're using clang-format to format your code, then you should definitely want function-parameter-list-requires-clauses on their own line for sure, to avoid confusing strings of tokens such as

void f() const noexcept requires foobar<T>;  // bad

void f() const noexcept  // better
  requires foobar<T>;

If you do that, then (in my proposed world) you would also automatically get

template<class T>
  requires foobar<T>  // automatically
void f();

template<class T> requires foobar<T>  // can't get this
void f();

template<foobar T>  // could also get this, via manual intervention
void f();

And personally I see nothing wrong with that state of affairs.

clang/unittests/Format/FormatTest.cpp
22288–22291

I guess the higher-level question (which I didn't think of until seeing many other tests) is, redesign this test case to be more targeted and thus shorter. E.g. could you s/std::trait<T>::value/Bar/ or would that destroy the point of the test? (I'm not sure because I'm not sure what the point of the test is.)

22374–22378

OK, if clang-format by default likes to align subexpressions vertically like that, then I agree with your formatting. Either way, though, what it's currently producing is incorrect. :)

22597

Thanks, that resolves this comment! (I left some review comments over there: I agree with that one's two-option approach even though I suggest better names for the two options.)

23268–23270

other than changing what is released feels a bit wrong

IIUC, nothing is released in this area right now. clang-format doesn't indent requires-clauses, sure, but that's because it doesn't understand the keyword. Once clang-format understands C++20, it makes perfect sense to me that we'd teach it to format C++20 in LLVM style, as opposed to formatting C++20 in "a style designed merely to imitate the previous buggy behavior." That'd be like... neoclassical buildings and sculptures being painted white to imitate ancient buildings and sculptures whose paint had worn off. It's not that the ancients liked white (it's not that LLVM 13 liked unindented requires-clauses); it's that their surviving artifacts (clang13-formatted codebases) reflect only a "decayed" version of their actual preferences, because of limitations in their technology.

HazardyKnusperkeks marked 15 inline comments as done.Feb 4 2022, 11:00 AM
HazardyKnusperkeks added inline comments.
clang/include/clang/Format/Format.h
3130–3136

I still can imagine someone maybe wants it different, but for now I go with your suggestion, only one option for all clauses.

Thanks for working on this! I love to see better requires support in clang-format. I didn't do a real review, just some drive-by comments.

clang/docs/ClangFormatStyleOptions.rst
3399

I assume that should clang-format 15 now. Same for other new options.

clang/lib/Format/Format.cpp
1212

Note the libc++ already uses C++20 code. We don't use clang-format officially, but I use it for new code. (Mainly the std::format related code.)

HazardyKnusperkeks marked an inline comment as done.
HazardyKnusperkeks edited the summary of this revision. (Show Details)

This took a lot of work and debugging.
I tried to minimize the refactoring on related code (I've prepared quite a few other commits and stashes...).

I'm confident that we can now format every sane use of requires clauses, expressions, and concept declarations. And even some I wouldn't call sane.

Decided to drop https://llvm.org/PR32166 since in this approach the book keeping to bring the indention back on track after the clause would be too much burden on everyone not using it.

HazardyKnusperkeks requested review of this revision.Feb 4 2022, 11:18 AM
HazardyKnusperkeks marked 3 inline comments as done.Feb 4 2022, 11:44 AM

Format notes updated on local commit.

clang/docs/ClangFormatStyleOptions.rst
3399

Yeah, you hit the small margin between my reply and my upload of the current diff. ;)

clang/lib/Format/Format.cpp
1212

And is there a consensus about formatting requires clauses?

Woow, that's a mountain of work!
All my comments are non-blocking.

clang/docs/ClangFormatStyleOptions.rst
1973
1977
1984–1985
1988
2675–2676
3405
3422

Maybe WithPreceding makes a little more sense without context?

3423–3425

Just a suggestion, I'm not a writer. I'd just like to see something clear and comprehensible.

3439
3440–3441
3456–3457

I'm really not fond of writing that "clauses try to do something" :).

3461–3465

It would be cool to have an example with long template, long parameter list and a long requires clause too.

clang/docs/ReleaseNotes.rst
251
clang/lib/Format/ContinuationIndenter.cpp
487
1465–1466

And why it is not in Indent?

clang/lib/Format/TokenAnnotator.cpp
3849–3857

The body of seems to be the exact same code as below in lines 3920-3926.
Maybe you can refactor to something like this?
Name of the lambda in my suggestion is maybe incorrect though.

3882–3890

And then here as well.

clang/lib/Format/UnwrappedLineParser.cpp
434–445

Haha, I guess it can be anything! :)

697

Why plural?

756–758

As you have a comment, put braces around the body of the if.

2110–2113
2582–2583

Doesn't it mean that it always returns? :) Not sure what the value of this comment is.
Maybe it should return a bool to indicate success/failure?

2667

Not a big fan of a double negation, but it rests acceptable here.

clang/unittests/Format/FormatTest.cpp
3496–3497

Do you test what was here previously with RequiresClausePosition: SingleLine (or whichever relevant option)?

19162

Please add a comment that true and false are for backward compatibility. It helps when grepping.

22542–22545

How about adding a test case with the exact same concept body but surrounded with parens (e.g. using decltype(...))?
The one below looks *almost* the same, but uses std::true_type and ::value and so it's harder to transpose to what this test should look like.

22704

Hmm, no space after requires here?
It looks strange.
My preference would be not to put a space before a parameter list (as in requires(T t)), but put a space before of a constraint expression.
It seems to be the style used in https://en.cppreference.com/w/cpp/language/constraints.
Unless there are options that modify this behaviour?

22704

Oh, I've just seen a comment below on basically the same topic.

HazardyKnusperkeks marked 22 inline comments as done.Feb 5 2022, 1:52 AM
HazardyKnusperkeks added inline comments.
clang/docs/ClangFormatStyleOptions.rst
3423–3425

I'm happy to reword the stuff. :)

clang/lib/Format/ContinuationIndenter.cpp
1465–1466

This one I tackled a month ago...
If I remember correctly this is because the indentation is not from an increased `Level of the line, but because the position comes out of getNewLineColumn`.
The Level would only work if I would generate different UnwrappedLines depending on the clause style, which currently I don't.
The `Indent is correctly reduced, but LastSpace` stayed.

clang/lib/Format/TokenAnnotator.cpp
3849–3857

Here is not about indentation, it is about breaking the line. And the difference is `WithFollowing vs. WithPreceding`.

clang/lib/Format/UnwrappedLineParser.cpp
697

Because it applies to all braces found within this block (not nested).

clang/unittests/Format/FormatTest.cpp
3496–3497

I don't understand the question.

Before we had no real handling of requires, which did something like a combination of `SingleLine and WithFollowing`.

I could have left the string in the test unchanged, but would have to set the style to `SingleLine`.

I thought it would be acceptable to change the test, since we basically only now defined a default for LLVMStyle.

22542–22545

Where should the parens go?
The misformatting is that sizeof is indented below bool, not below the l_square.

22704

Yeah, the change for that is D113369 which I will update shortly.

We can gladly discuss the defaults there.

HazardyKnusperkeks marked 2 inline comments as done.
  • Incorporated review notes
  • Rebased
MyDeveloperDay accepted this revision.Feb 5 2022, 3:04 AM

@HazardyKnusperkeks I think this is tremendous, I think this looks and feels like a much more thorough approach to formatting concepts. Thank you so much for taking this on and for adding SO many unit tests that gives us lots of good examples, I'm probably not doing as an in-depth review as others because my own team aren't able to use a compiler on all platforms that allow us to use concepts and so my skill level around them isn't as great because I don't use them day to day. But I'm going to give this a LGTM and when the others are happy I'm more than happy.

My personal opinion is when reviews get this big, and we are making forward progress then I'm ok to "BANK!" what we have. My fear for large reviews is if we get a regression in some other behaviour and we have to say "What changed" and the response is "Everything Changed" then finding out why becomes so much harder. So from my perspective, I'd like to see this landed and any additional issues worked out in separate reviews.

This revision is now accepted and ready to land.Feb 5 2022, 3:04 AM
curdeius accepted this revision.Feb 5 2022, 4:24 AM

I can just say: ship it!

Thanks for all the work!

clang/lib/Format/ContinuationIndenter.cpp
1465–1466

Ok. Roger.

clang/lib/Format/TokenAnnotator.cpp
3849–3857

That's what I meant saying that the name might be incorrect.

clang/unittests/Format/FormatTest.cpp
3496–3497

Fair enough.

22542–22545

Oh, right.

HazardyKnusperkeks marked 7 inline comments as done.
  • Fixed use of a variable which slipped in from another change.
  • Rebased.

Small fix in the test. For this change it is not relevant, but for the next one.

Mordante added inline comments.Feb 8 2022, 8:58 AM
clang/lib/Format/Format.cpp
1212

No there's not. But maybe it's interesting to discuss that after the patch lands to see how the different options look in real code.

It appears that after this patch clang-format started breaking up requires in javascript, e.g.:

// before
function f() {
  var requires = {};
}
// after
function f() {
  var
    requires
  = {};
}

Could we restrict the requires logic only to (obj)C++? (normally I'd go and come up with a patch for cases like this, but this patch is quite large and it's not immediately clear where's the best place for this restriction).

It appears that this causes a regression by adding a space in-between pointer dereferences *p -> * p in some cases, e.g. formatting this with llvm style:

// before
void f() {
  while (p < a && *p == 'a')
    p++;
}
// after
void f() {
  while (p < a && * p == 'a')
    p++;
}

It appears that after this patch clang-format started breaking up requires in javascript, e.g.:

// before
function f() {
  var requires = {};
}
// after
function f() {
  var
    requires
  = {};
}

Could we restrict the requires logic only to (obj)C++? (normally I'd go and come up with a patch for cases like this, but this patch is quite large and it's not immediately clear where's the best place for this restriction).

Addressed in https://reviews.llvm.org/D120324.

It appears that this causes a regression by adding a space in-between pointer dereferences *p -> * p in some cases, e.g. formatting this with llvm style:

// before
void f() {
  while (p < a && *p == 'a')
    p++;
}
// after
void f() {
  while (p < a && * p == 'a')
    p++;
}

* in *p is indeed annotated as a binary operator now:

clang-format-main-20220110-5ff916ab72b26e667bd5d2e4a762650ba479c781--style=file --debug-only=format-token-annotator test.cpp
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=while L=5 PPK=2 FakeLParens= FakeRParens=0 II=0x2f4bf80a6a0 Text='while'
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=23 Name=l_paren L=7 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='('
 M=0 C=1 T=Unknown S=0 F=0 B=0 BK=0 P=59 Name=identifier L=8 PPK=2 FakeLParens=10/5/ FakeRParens=0 II=0x2f4bf806fd8 Text='p'
 M=0 C=0 T=BinaryOperator S=1 F=0 B=0 BK=0 P=50 Name=less L=10 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='<'
 M=0 C=1 T=Unknown S=1 F=0 B=0 BK=0 P=50 Name=identifier L=12 PPK=2 FakeLParens= FakeRParens=1 II=0x2f4bf807008 Text='q'
 M=0 C=0 T=BinaryOperator S=1 F=0 B=0 BK=0 P=45 Name=ampamp L=15 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='&&'
 M=0 C=1 T=UnaryOperator S=1 F=0 B=0 BK=0 P=45 Name=star L=17 PPK=2 FakeLParens=0/ FakeRParens=0 II=0x0 Text='*'
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=100 Name=identifier L=18 PPK=2 FakeLParens= FakeRParens=2 II=0x2f4bf806fd8 Text='p'
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=43 Name=r_paren L=19 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=')'
----
AnnotatedTokens(L=1):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=semi L=1 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=';'
----
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=eof L=0 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=''
----
while (p < q && *p)
  ;

whereas it was (correctly) a unary operator:

clang-format-main-20220222-e0219872--style=file --debug-only=format-token-annotator test.cpp
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=while L=5 PPK=2 FakeLParens= FakeRParens=0 II=0x23639666fa0 Text='while'
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=23 Name=l_paren L=7 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='('
 M=0 C=1 T=Unknown S=0 F=0 B=0 BK=0 P=59 Name=identifier L=8 PPK=2 FakeLParens=10/5/ FakeRParens=0 II=0x236396a30a0 Text='p'
 M=0 C=0 T=BinaryOperator S=1 F=0 B=0 BK=0 P=50 Name=less L=10 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='<'
 M=0 C=1 T=Unknown S=1 F=0 B=0 BK=0 P=50 Name=identifier L=12 PPK=2 FakeLParens= FakeRParens=1 II=0x236396a30d0 Text='q'
 M=0 C=0 T=BinaryOperator S=1 F=0 B=0 BK=0 P=45 Name=ampamp L=15 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='&&'
 M=0 C=1 T=BinaryOperator S=1 F=0 B=0 BK=0 P=45 Name=star L=17 PPK=2 FakeLParens=14/ FakeRParens=0 II=0x0 Text='*'
 M=0 C=1 T=Unknown S=1 F=0 B=0 BK=0 P=54 Name=identifier L=19 PPK=2 FakeLParens= FakeRParens=2 II=0x236396a30a0 Text='p'
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=43 Name=r_paren L=20 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=')'
----
AnnotatedTokens(L=1):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=semi L=1 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=';'
----
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=eof L=0 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=''
----
while (p < q && * p)
  ;

It appears that this causes a regression by adding a space in-between pointer dereferences *p -> * p in some cases, e.g. formatting this with llvm style:

// before
void f() {
  while (p < a && *p == 'a')
    p++;
}
// after
void f() {
  while (p < a && * p == 'a')
    p++;
}

* in *p is indeed annotated as a binary operator now:

clang-format-main-20220110-5ff916ab72b26e667bd5d2e4a762650ba479c781--style=file --debug-only=format-token-annotator test.cpp
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=while L=5 PPK=2 FakeLParens= FakeRParens=0 II=0x2f4bf80a6a0 Text='while'
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=23 Name=l_paren L=7 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='('
 M=0 C=1 T=Unknown S=0 F=0 B=0 BK=0 P=59 Name=identifier L=8 PPK=2 FakeLParens=10/5/ FakeRParens=0 II=0x2f4bf806fd8 Text='p'
 M=0 C=0 T=BinaryOperator S=1 F=0 B=0 BK=0 P=50 Name=less L=10 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='<'
 M=0 C=1 T=Unknown S=1 F=0 B=0 BK=0 P=50 Name=identifier L=12 PPK=2 FakeLParens= FakeRParens=1 II=0x2f4bf807008 Text='q'
 M=0 C=0 T=BinaryOperator S=1 F=0 B=0 BK=0 P=45 Name=ampamp L=15 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='&&'
 M=0 C=1 T=UnaryOperator S=1 F=0 B=0 BK=0 P=45 Name=star L=17 PPK=2 FakeLParens=0/ FakeRParens=0 II=0x0 Text='*'
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=100 Name=identifier L=18 PPK=2 FakeLParens= FakeRParens=2 II=0x2f4bf806fd8 Text='p'
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=43 Name=r_paren L=19 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=')'
----
AnnotatedTokens(L=1):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=semi L=1 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=';'
----
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=eof L=0 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=''
----
while (p < q && *p)
  ;

whereas it was (correctly) a unary operator:

clang-format-main-20220222-e0219872--style=file --debug-only=format-token-annotator test.cpp
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=while L=5 PPK=2 FakeLParens= FakeRParens=0 II=0x23639666fa0 Text='while'
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=23 Name=l_paren L=7 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='('
 M=0 C=1 T=Unknown S=0 F=0 B=0 BK=0 P=59 Name=identifier L=8 PPK=2 FakeLParens=10/5/ FakeRParens=0 II=0x236396a30a0 Text='p'
 M=0 C=0 T=BinaryOperator S=1 F=0 B=0 BK=0 P=50 Name=less L=10 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='<'
 M=0 C=1 T=Unknown S=1 F=0 B=0 BK=0 P=50 Name=identifier L=12 PPK=2 FakeLParens= FakeRParens=1 II=0x236396a30d0 Text='q'
 M=0 C=0 T=BinaryOperator S=1 F=0 B=0 BK=0 P=45 Name=ampamp L=15 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='&&'
 M=0 C=1 T=BinaryOperator S=1 F=0 B=0 BK=0 P=45 Name=star L=17 PPK=2 FakeLParens=14/ FakeRParens=0 II=0x0 Text='*'
 M=0 C=1 T=Unknown S=1 F=0 B=0 BK=0 P=54 Name=identifier L=19 PPK=2 FakeLParens= FakeRParens=2 II=0x236396a30a0 Text='p'
 M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=43 Name=r_paren L=20 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=')'
----
AnnotatedTokens(L=1):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=semi L=1 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=';'
----
AnnotatedTokens(L=0):
 M=0 C=0 T=Unknown S=1 F=0 B=0 BK=0 P=0 Name=eof L=0 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=''
----
while (p < q && * p)
  ;

That means someone guesses this is a binary op and guesses wrong. I try to take a look at it.

That means someone guesses this is a binary op and guesses wrong. I try to take a look at it.

Yeah, I just found out that it is set in determineStarAmpUsage here:

if (PrevToken->Tok.isLiteral() ||
    PrevToken->isOneOf(tok::r_paren, tok::r_square, tok::kw_true,
                       tok::kw_false, tok::r_brace) ||
    NextToken->Tok.isLiteral() ||
    NextToken->isOneOf(tok::kw_true, tok::kw_false) ||
    NextToken->isUnaryOperator() ||
    // If we know we're in a template argument, there are no named
    // declarations. Thus, having an identifier on the right-hand side
    // indicates a binary operator.
    (InTemplateArgument && NextToken->Tok.isAnyIdentifier()))
  return TT_BinaryOperator;

Strangely, InTemplateArgument is true here, but not sure how that happens, as it wasn't something you changed directly.

@HazardyKnusperkeks, forget my previous comment, I think I found what changed.
Before, the * was annotated twice, first as a BinaryOperator, then, on a second run, it was correctly annotated as a UnaryOperator.
In this revision you whitelisted BinaryOperator in resetTokenMetadata, so its type is determined once only.

Yep, confirmed, but I don't know how to fix it.
Just removing TT_BinaryOperator from resetTokenMetadata provokes these failures:

[  FAILED  ] FormatTest.Concepts
[  FAILED  ] FormatTest.RequiresClauses
[  FAILED  ] TokenAnnotatorTest.UnderstandsRequiresClausesAndConcepts

Hopefully you have some idea how to fix it...

Oh, BTW, here's the test for the regression that Krasimir found:

TEST_F(TokenAnnotatorTest, UnderstandsLoops) {
  auto Tokens = annotate("while (p < q && *p == 0) {}");
  EXPECT_EQ(Tokens.size(), 14u) << Tokens;
  EXPECT_TOKEN(Tokens[6], tok::star, TT_UnaryOperator);
}

Yep, confirmed, but I don't know how to fix it.
Just removing TT_BinaryOperator from resetTokenMetadata provokes these failures:

[  FAILED  ] FormatTest.Concepts
[  FAILED  ] FormatTest.RequiresClauses
[  FAILED  ] TokenAnnotatorTest.UnderstandsRequiresClausesAndConcepts

Hopefully you have some idea how to fix it...

Oh, BTW, here's the test for the regression that Krasimir found:

TEST_F(TokenAnnotatorTest, UnderstandsLoops) {
  auto Tokens = annotate("while (p < q && *p == 0) {}");
  EXPECT_EQ(Tokens.size(), 14u) << Tokens;
  EXPECT_TOKEN(Tokens[6], tok::star, TT_UnaryOperator);
}

This is addressed in D120511.